[GitHub] [spark] viirya commented on a change in pull request #26978: [SPARK-29721][SQL] Prune unnecessary nested fields from Generate without Project

GitBox Thu, 16 Jan 2020 14:03:48 -0800

viirya commented on a change in pull request #26978: [SPARK-29721][SQL] Prune 
unnecessary nested fields from Generate without Project
URL: https://github.com/apache/spark/pull/26978#discussion_r367673756


 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
 ##########
 @@ -155,6 +155,49 @@ object NestedColumnAliasing {
     case MapType(keyType, valueType, _) => totalFieldNum(keyType) + 
totalFieldNum(valueType)
     case _ => 1 // UDT and others
   }
+}
+
+/**
+ * This prunes unnessary nested columns from `Generate` and optional `Project` 
on top
+ * of it.
+ */
+object GeneratorNestedColumnAliasing {
+  def unapply(plan: LogicalPlan): Option[LogicalPlan] = plan match {
+    case Project(projectList, g: Generate) if 
(SQLConf.get.nestedPruningOnExpressions ||
+        SQLConf.get.nestedSchemaPruningEnabled) && 
canPruneGenerator(g.generator) =>
 
 Review comment:
   One reason to add `nestedSchemaPruningEnabled` here is, we cannot just push 
through Generate (the next patten case) without this Project + Generate case.
   
   If so, we will hit a failure query plan that there is nested column accessor 
on top Project which is not pruned through, but the other nested column at 
Generate is pruned through it to its child. Then the nested column accessor on 
the top Project is unresolvable.
   
   E.g.:
   
   !Project [a.b, col]
     + Generate [explode(gen_alias#123), col]
       + Project [a.c as gen_alias#123]
          + Scan [a:<c:array<int>>]

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on a change in pull request #26978: [SPARK-29721][SQL] Prune unnecessary nested fields from Generate without Project

Reply via email to