HeartSaVioR opened a new pull request #24173: [SPARK-27237][SS] Introduce State 
schema validation among query restart
URL: https://github.com/apache/spark/pull/24173
 
 
   ## What changes were proposed in this pull request?
   
   Please refer the description of 
[SPARK-27237](https://issues.apache.org/jira/browse/SPARK-27237) to see 
rationalization of this patch.
   
   This patch proposes to introduce state schema validation, via storing key 
schema and value schema to `schema` file (for the first time) and verify new 
key schema and value schema for state are compatible with existing one. To be 
clear for definition of "compatible", state schema is "compatible" when number 
of fields are same and data type for each field is same - Spark has been 
allowing rename of field.
   
   This patch will prevent query run which has incompatible state schema, which 
would reduce the chance to get indeterministic behavior (actually renaming of 
field is also the smell of semantically incompatible, but end users could just 
modify its name so we can't say) as well as providing more informative error 
message.
   
   ## How was this patch tested?
   
   Added UTs.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to