viirya commented on a change in pull request #27503: [SPARK-30761]SQL] Nested
column pruning should not prune on required child outputs in Generate
URL: https://github.com/apache/spark/pull/27503#discussion_r376753522
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##########
@@ -179,7 +185,10 @@ object GeneratorNestedColumnAliasing {
case g: Generate if SQLConf.get.nestedSchemaPruningEnabled &&
canPruneGenerator(g.generator) =>
- NestedColumnAliasing.getAliasSubMap(g.generator.children).map {
+ // For the child outputs required by the operator on top of `Generate`,
we do not want
+ // to prune it.
+ val requiredAttrs = AttributeSet(g.requiredChildOutput)
+ NestedColumnAliasing.getAliasSubMap(g.generator.children,
requiredAttrs).map {
Review comment:
This case normally should be treated by above case pattern (Project +
Generate). But if all nested fields are selected at top Project, the above case
won't prune. Then when Optimizer transforms down to the underlying Generate,
only the referred nested column are kept and others are pruned from the child.
It causes the accessors at top Project unresolved.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]