karenfeng opened a new pull request #32017: URL: https://github.com/apache/spark/pull/32017
### What changes were proposed in this pull request? Changes the metadata propagation framework. Previously, most `LogicalPlan`'s propagated their `children`'s `metadataOutput`. This did not make sense in cases where the `LogicalPlan` did not even propagate their `children`'s `output`. Now, `LogicalPlan`s default to not having any `metadataOutput`. I modified `LogicalPlan`s that propagate their children's output in order to also propagate their children's metadata output. The most notable exclusion here is that `Project` no longer has metadata output. If the user wants metadata attributes in the output, they should be added inside the `Analyzer`'s `AddMetadataColumns` rule. ### Why are the changes needed? Previously, `SELECT m from (SELECT a from tb)` would output `m` if it were metadata. This did not make sense. ### Does this PR introduce _any_ user-facing change? Yes. Now, `SELECT m from (SELECT a from tb)` will encounter an `AnalysisException`. ### How was this patch tested? Added unit tests. I did not cover all cases, as they are fairly extensive. However, the new tests cover major cases (and an existing test already covers Join). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
