HeartSaVioR commented on pull request #27627: URL: https://github.com/apache/spark/pull/27627#issuecomment-637937197
Actually, Spark 3.0.0 is the better place to land if we only concern about backward compatibility, but even for the major version update we also don't want to scare end users. SPARK-26154 introduced the "versioning" of the state of stream-stream join, so that Spark 3.0.0 can indicate the "old" state and fail the query with "proper" error message. There's no such thing for this patch; that's why I asked about "versioning" of state for streaming aggregation "function" but I'm not sure it's preferred approach and even we agree with that I'm not sure we have enough time to deal with it in Spark 3.0.0. (It's actually delayed pretty much.) My personal feeling is that we should bring the essential functionality (say, schema information of state, #24173) ASAP, so that we can at least guide such case for end users like "if you didn't touch your query but encounter the schema incompatible error on state, please find migration guide for your version to see there's any backward incompatible change" in the future. Unfortunately, even we adopt #24173 into Spark 3.0.0 (even I'm not sure it would happen), that doesn't apply on migration from Spark 2.x to 3.0.0 as they won't have schema in existing state from Spark 2.x as of now. But even we can craft a tool to create schema file for states on Spark 2.x structured streaming query so that end users can adopt it before migrating to Spark 3.0.0. #28707 would help determining the issue at least for this issue (as the number of fields will not match) so #28707 might unblock this patch to be included in branch-3.0, but the error message would be a bit unfriendly because we won't have detailed information about schema of the state. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org