ericm-db opened a new pull request, #53684: URL: https://github.com/apache/spark/pull/53684
## What changes were proposed in this pull request? This PR introduces the `NameStreamingSources` analyzer rule and supporting infrastructure to enable streaming source evolution. This allows streaming queries to add, remove, or reorder sources without losing state by assigning stable names to sources. Key changes: - Added `HasStreamingSourceIdentifyingName` trait for uniform name propagation - Updated `StreamingRelationV2` to support source identifying names - Created `NameStreamingSources` analyzer rule to propagate names from `NamedStreamingRelation` wrappers - Added `spark.sql.streaming.queryEvolution.enableStreamingSourceEvolution` config flag - Added error handling for unnamed sources when enforcement is enabled ## Why are the changes needed? Currently, streaming sources are identified by their position in the query plan (sources/0, sources/1, etc.). This makes it impossible to add, remove, or reorder sources without breaking checkpoint compatibility. By assigning stable names to sources, we enable: 1. **Source evolution**: Add/remove/reorder sources without losing state 2. **Stable checkpoint locations**: sources/<name> instead of sources/0, sources/1 3. **Better debugging**: Named sources are easier to identify and debug ## Does this PR introduce _any_ user-facing change? No. The infrastructure is in place but the user-facing `.name()` DataFrame API is not yet exposed. The analyzer rule handles existing `NamedStreamingRelation` nodes that may be created internally. ## How was this patch tested? - Added comprehensive unit tests in `NameStreamingSourcesSuite` (15 test cases) - Tests cover name propagation, enforcement checks, error messages, and edge cases - Tests verify behavior with UserProvided, FlowAssigned, and Unassigned names ## Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
