karenfeng opened a new pull request #32017:
URL: https://github.com/apache/spark/pull/32017


   ### What changes were proposed in this pull request?
   
   Changes the metadata propagation framework.
   
   Previously, most `LogicalPlan`'s propagated their `children`'s 
`metadataOutput`. This did not make sense in cases where the `LogicalPlan` did 
not even propagate their `children`'s `output`.
   
   Now, `LogicalPlan`s default to not having any `metadataOutput`. I modified 
`LogicalPlan`s that propagate their children's output in order to also 
propagate their children's metadata output. The most notable exclusion here is 
that `Project` no longer has metadata output.
   
   If the user wants metadata attributes in the output, they should be added 
inside the `Analyzer`'s `AddMetadataColumns` rule.
   
   ### Why are the changes needed?
   
   Previously, `SELECT m from (SELECT a from tb)` would output `m` if it were 
metadata. This did not make sense.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. Now, `SELECT m from (SELECT a from tb)` will encounter an 
`AnalysisException`.
   
   ### How was this patch tested?
   
   Added unit tests. I did not cover all cases, as they are fairly extensive. 
However, the new tests cover major cases (and an existing test already covers 
Join).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to