[GitHub] [spark] viirya commented on a change in pull request #27503: [SPARK-30761]SQL] Nested column pruning should not prune on required child outputs in Generate

GitBox Sat, 08 Feb 2020 20:17:00 -0800

viirya commented on a change in pull request #27503: [SPARK-30761]SQL] Nested 
column pruning should not prune on required child outputs in Generate
URL: https://github.com/apache/spark/pull/27503#discussion_r376753522


 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
 ##########
 @@ -179,7 +185,10 @@ object GeneratorNestedColumnAliasing {
 
     case g: Generate if SQLConf.get.nestedSchemaPruningEnabled &&
         canPruneGenerator(g.generator) =>
-      NestedColumnAliasing.getAliasSubMap(g.generator.children).map {
+      // For the child outputs required by the operator on top of `Generate`, 
we do not want
+      // to prune it.
+      val requiredAttrs = AttributeSet(g.requiredChildOutput)
+      NestedColumnAliasing.getAliasSubMap(g.generator.children, 
requiredAttrs).map {
 
 Review comment:
   This case normally should be treated by above case pattern (Project + 
Generate). But if all nested fields are selected at top Project, the above case 
won't prune. Then when Optimizer transforms down to the underlying Generate, 
only the referred nested column are kept and others are pruned from the child. 
It causes the accessors at top Project unresolved.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on a change in pull request #27503: [SPARK-30761]SQL] Nested column pruning should not prune on required child outputs in Generate

Reply via email to