ericm-db opened a new pull request, #56015: URL: https://github.com/apache/spark/pull/56015
### What changes were proposed in this pull request? When the streaming source evolution flag (`spark.sql.streaming.queryEvolution.enableSourceEvolution`) is set to `true`, force the offset log format to `VERSION_2` for new streaming queries. In `MicroBatchExecution.initializeExecution`, the offset log format version selection now takes `max(STREAMING_OFFSET_LOG_FORMAT_VERSION, minRequiredVersion)`, where `minRequiredVersion` is `VERSION_2` when source evolution is enabled and `VERSION_1` otherwise. Existing queries continue to use whatever version is already written in their offset log (read from `latestStartedBatch`), so this only affects new queries. The `testWithSourceEvolution` helper in `StreamingSourceEvolutionSuite` was updated to no longer set the offset log version explicitly, since it is now selected automatically. ### Why are the changes needed? Streaming source evolution relies on the `OffsetMap` (sourceId -> offset) format, which is only available in offset log `VERSION_2`. Previously, users had to remember to set `spark.sql.streaming.offsetLog.formatVersion=2` alongside enabling source evolution; otherwise the format would default to `VERSION_1` (sequence-based) and the named-source tracking required by source evolution would not function properly. Coupling the two configs eliminates a footgun. ### Does this PR introduce _any_ user-facing change? No. The change only affects new streaming queries that explicitly enable the internal `spark.sql.streaming.queryEvolution.enableSourceEvolution` flag. For such queries, the offset log will now use `VERSION_2` automatically. Users who manually set the offset log version remain in control: the final version is `max(configuredVersion, minRequiredVersion)`, so a user-configured `VERSION_2` keeps working unchanged. ### How was this patch tested? - Added `offset log uses VERSION_2 when source evolution is enabled` test in `StreamingSourceEvolutionSuite`. - Existing `StreamingSourceEvolutionSuite` tests pass after dropping the explicit offset log version from `testWithSourceEvolution` (19/19). - `OffsetSeqLogSuite` continues to pass (19/19). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (claude-opus-4-7) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
