HeartSaVioR commented on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-637926400


   And personally I'd rather do the check in StateStore with additional 
overhead of reading "a" row in prior to achieve the same in all stateful 
operations. 
   
   ```
     /** Get or create a store associated with the id. */
     def get(
         storeProviderId: StateStoreProviderId,
         keySchema: StructType,
         valueSchema: StructType,
         indexOrdinal: Option[Int],
         version: Long,
         storeConf: StateStoreConf,
         hadoopConf: Configuration): StateStore = {
       require(version >= 0)
       val storeProvider = loadedProviders.synchronized {
         startMaintenanceIfNeeded()
         val provider = loadedProviders.getOrElseUpdate(
           storeProviderId,
           StateStoreProvider.createAndInit(
             storeProviderId.storeId, keySchema, valueSchema, indexOrdinal, 
storeConf, hadoopConf)
         )
         reportActiveStoreInstance(storeProviderId)
         provider
       }
       val store = storeProvider.getStore(version)
       val iter = store.iterator()
       if (iter.nonEmpty) {
         val rowPair = iter.next()
         val key = rowPair.key
         val value = rowPair.value
         // TODO: validate key with key schema
         // TODO: validate value with value schema
       }
       store
     }
   ```
   
   For streaming aggregations it initializes "two" state stores so the overhead 
goes to "two" rows, but I don't think the overhead matters much.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to