[GitHub] [spark] HeartSaVioR commented on a change in pull request #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

GitBox Sat, 25 Jul 2020 02:40:28 -0700


HeartSaVioR commented on a change in pull request #24173:
URL: https://github.com/apache/spark/pull/24173#discussion_r460387116




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
##########
@@ -391,10 +399,18 @@ object StateStore extends Logging {
     require(version >= 0)
     val storeProvider = loadedProviders.synchronized {
       startMaintenanceIfNeeded()
+
+      val newProvIdSchemaCheck = 
StateStoreProviderId.withNoPartitionInformation(storeProviderId)
+      if (!schemaValidated.contains(newProvIdSchemaCheck)) {

Review comment:
       So there're two different purposes on constructing schema information:
   
   1) checking schema compatibility
   2) leveraging schema information to open the chance for further improvements 
- e.g. read state without query (like state datasource), possibly applying 
projection to match schemas when if ordinal only differs, schema evolution 
(probably?)
   
   We'd want to leave schema information even with the config being disabled, 
to enable 2).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on a change in pull request #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

Reply via email to