ericm-db opened a new pull request, #53684:
URL: https://github.com/apache/spark/pull/53684

   ## What changes were proposed in this pull request?
   
   This PR introduces the `NameStreamingSources` analyzer rule and supporting 
infrastructure to enable streaming source evolution. This allows streaming 
queries to add, remove, or reorder sources without losing state by assigning 
stable names to sources.
   
   Key changes:
   - Added `HasStreamingSourceIdentifyingName` trait for uniform name 
propagation
   - Updated `StreamingRelationV2` to support source identifying names
   - Created `NameStreamingSources` analyzer rule to propagate names from 
`NamedStreamingRelation` wrappers
   - Added `spark.sql.streaming.queryEvolution.enableStreamingSourceEvolution` 
config flag
   - Added error handling for unnamed sources when enforcement is enabled
   
   ## Why are the changes needed?
   
   Currently, streaming sources are identified by their position in the query 
plan (sources/0, sources/1, etc.). This makes it impossible to add, remove, or 
reorder sources without breaking checkpoint compatibility. By assigning stable 
names to sources, we enable:
   
   1. **Source evolution**: Add/remove/reorder sources without losing state
   2. **Stable checkpoint locations**: sources/<name> instead of sources/0, 
sources/1
   3. **Better debugging**: Named sources are easier to identify and debug
   
   ## Does this PR introduce _any_ user-facing change?
   
   No. The infrastructure is in place but the user-facing `.name()` DataFrame 
API is not yet exposed. The analyzer rule handles existing 
`NamedStreamingRelation` nodes that may be created internally.
   
   ## How was this patch tested?
   
   - Added comprehensive unit tests in `NameStreamingSourcesSuite` (15 test 
cases)
   - Tests cover name propagation, enforcement checks, error messages, and edge 
cases
   - Tests verify behavior with UserProvided, FlowAssigned, and Unassigned names
   
   ## Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to