brkyvz commented on code in PR #48401:
URL: https://github.com/apache/spark/pull/48401#discussion_r1853910322


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala:
##########
@@ -37,6 +38,29 @@ case class StateSchemaValidationResult(
     schemaPath: String
 )
 
+/** An Avro-based encoder used for serializing between UnsafeRow and Avro

Review Comment:
   nit: text needs to move to the next line



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala:
##########
@@ -593,6 +671,17 @@ object RocksDBStateStoreProvider {
   val STATE_ENCODING_VERSION: Byte = 0
   val VIRTUAL_COL_FAMILY_PREFIX_BYTES = 2
 
+  private val MAX_AVRO_ENCODERS_IN_CACHE = 1000
+  // Add the cache at companion object level so it persists across provider 
instances
+  private val avroEncoderMap: NonFateSharingCache[String, AvroEncoder] = {
+    val guavaCache = CacheBuilder.newBuilder()
+      .maximumSize(MAX_AVRO_ENCODERS_IN_CACHE)  // Adjust size based on your 
needs
+      .expireAfterAccess(1, TimeUnit.HOURS)  // Optional: Add expiration if 
needed
+      .build[String, AvroEncoder]()

Review Comment:
   can you make some of these configurable? It can be in a follow up



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to