HeartSaVioR commented on pull request #27627:
URL: https://github.com/apache/spark/pull/27627#issuecomment-637937197


   Actually, Spark 3.0.0 is the better place to land if we only concern about 
backward compatibility, but even for the major version update we also don't 
want to scare end users.
   
   SPARK-26154 introduced the "versioning" of the state of stream-stream join, 
so that Spark 3.0.0 can indicate the "old" state and fail the query with 
"proper" error message. There's no such thing for this patch; that's why I 
asked about "versioning" of state for streaming aggregation "function" but I'm 
not sure it's preferred approach and even we agree with that I'm not sure we 
have enough time to deal with it in Spark 3.0.0. (It's actually delayed pretty 
much.)
   
   My personal feeling is that we should bring the essential functionality 
(say, schema information of state, #24173) ASAP, so that we can at least guide 
such case for end users like "if you didn't touch your query but encounter the 
schema incompatible error on state, please find migration guide for your 
version to see there's any backward incompatible change" in the future.
   
   Unfortunately, even we adopt #24173 into Spark 3.0.0 (even I'm not sure it 
would happen), that doesn't apply on migration from Spark 2.x to 3.0.0 as they 
won't have schema in existing state from Spark 2.x as of now. But even we can 
craft a tool to create schema file for states on Spark 2.x structured streaming 
query so that end users can adopt it before migrating to Spark 3.0.0.
   
   #28707 would help determining the issue at least for this issue (as the 
number of fields will not match) so #28707 might unblock this patch to be 
included in branch-3.0, but the error message would be a bit unfriendly because 
we won't have detailed information about schema of the state.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to