cloud-fan opened a new pull request, #53861:
URL: https://github.com/apache/spark/pull/53861

   ### What changes were proposed in this pull request?
   
   This PR simplifies `SubqueryAlias.metadataOutput` to always propagate 
metadata columns from its child, rather than only propagating when the child is 
a `LeafNode` or another `SubqueryAlias`.
   
   The previous implementation was introduced in SPARK-40149 as a workaround to 
forbid queries like `SELECT m FROM (SELECT a FROM t)` while still allowing 
DataFrame API chaining. However, this created an inconsistency since 
`SubqueryAlias` should conceptually just rename/qualify columns, not filter 
which ones are accessible.
   
   With this change:
   - `SubqueryAlias` always propagates `metadataOutput` (with qualifier applied)
   - The `qualifiedAccessOnly` filter is preserved to handle natural join 
metadata columns
   - Queries like `SELECT m FROM (SELECT a FROM t) AS alias` now work, 
consistent with how `Project` already propagates metadata columns
   
   ### Why are the changes needed?
   
   1. **Consistency**: `SubqueryAlias` is a rename operation and should not 
selectively block metadata column propagation
   2. **Simpler code**: Removes the special-case logic checking for 
`LeafNode`/`SubqueryAlias` children
   3. **Better error messages**: When metadata columns from both sides of a 
join have the same name, users now get an "ambiguous reference" error rather 
than "column not found"
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, queries that previously failed with "column not found" when accessing 
metadata columns through a subquery alias will now succeed (if unambiguous) or 
fail with "ambiguous reference" (if multiple columns have the same name).
   
   ### How was this patch tested?
   
   Updated existing tests and added new test for ambiguous metadata columns 
after join with SubqueryAlias.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Yes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to