ericm-db opened a new pull request, #53639:
URL: https://github.com/apache/spark/pull/53639

   ### What changes were proposed in this pull request?
   
   This PR introduces infrastructure for tracking and propagating source 
identifying names through query analysis for streaming queries. It adds:
   
   1. **StreamingSourceIdentifyingName** - A sealed trait hierarchy 
representing the naming state of streaming sources:
      - `UserProvided(name)` - Explicitly set via `.name()` API
      - `FlowAssigned(name)` - Assigned by external flow systems (e.g., DLT)
      - `Unassigned` - No name assigned yet (to be auto-generated)
   
   2. **NamedStreamingRelation** - A transparent wrapper node that:
      - Carries source identifying names through the analyzer phase
      - Extends `UnaryNode` for transparent interaction with analyzer rules
      - Stays unresolved until explicitly unwrapped by a future 
`NameStreamingSources` analyzer rule
      - Provides `withUserProvidedName()` to attach user-specified names
   
   3. **NAMED_STREAMING_RELATION** tree pattern for efficient pattern matching
   
   ### Why are the changes needed?
   
   Streaming sources need stable, predictable names for:
   - **Checkpoint location stability** - Schema evolution and offset tracking 
require consistent source identification
   - **Schema lookup at specific offsets** - Analysis-time operations need to 
reference sources by name
   - **Flow integration** - DLT and similar systems need per-source metadata 
paths
   - **User control** - Allow users to explicitly name sources via the 
`.name()` API
   
   By introducing this wrapper during analysis (rather than at execution 
planning), we enable these capabilities while maintaining a clean separation 
between parsing, analysis, and execution phases.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New unit tests in `NamedStreamingRelationSuite` covering:
   - Source name state transitions (Unassigned → UserProvided)
   - Output delegation to child plan
   - Tree pattern registration
   - Resolved state behavior
   - String representation
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to