ericm-db opened a new pull request, #53639:
URL: https://github.com/apache/spark/pull/53639
### What changes were proposed in this pull request?
This PR introduces infrastructure for tracking and propagating source
identifying names through query analysis for streaming queries. It adds:
1. **StreamingSourceIdentifyingName** - A sealed trait hierarchy
representing the naming state of streaming sources:
- `UserProvided(name)` - Explicitly set via `.name()` API
- `FlowAssigned(name)` - Assigned by external flow systems (e.g., DLT)
- `Unassigned` - No name assigned yet (to be auto-generated)
2. **NamedStreamingRelation** - A transparent wrapper node that:
- Carries source identifying names through the analyzer phase
- Extends `UnaryNode` for transparent interaction with analyzer rules
- Stays unresolved until explicitly unwrapped by a future
`NameStreamingSources` analyzer rule
- Provides `withUserProvidedName()` to attach user-specified names
3. **NAMED_STREAMING_RELATION** tree pattern for efficient pattern matching
### Why are the changes needed?
Streaming sources need stable, predictable names for:
- **Checkpoint location stability** - Schema evolution and offset tracking
require consistent source identification
- **Schema lookup at specific offsets** - Analysis-time operations need to
reference sources by name
- **Flow integration** - DLT and similar systems need per-source metadata
paths
- **User control** - Allow users to explicitly name sources via the
`.name()` API
By introducing this wrapper during analysis (rather than at execution
planning), we enable these capabilities while maintaining a clean separation
between parsing, analysis, and execution phases.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
New unit tests in `NamedStreamingRelationSuite` covering:
- Source name state transitions (Unassigned → UserProvided)
- Output delegation to child plan
- Tree pattern registration
- Resolved state behavior
- String representation
### Was this patch authored or co-authored using generative AI tooling?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]