trohwer commented on code in PR #37348:
URL: https://github.com/apache/spark/pull/37348#discussion_r1869519579


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala:
##########
@@ -184,9 +184,20 @@ object NestedColumnAliasing {
       plan: LogicalPlan,
       nestedFieldToAlias: Map[Expression, Alias],
       attrToAliases: AttributeMap[Seq[Alias]]): LogicalPlan = {
-    plan.withNewChildren(plan.children.map { plan =>
-      Project(plan.output.flatMap(a => attrToAliases.getOrElse(a, Seq(a))), 
plan)
-    }).transformExpressions {
+    val newChildPlan = plan match {
+      case g: Generate =>
+        g.withNewChildren(g.children.map { childPlan =>
+          val origOutput = childPlan.output
+          val fromAlias = childPlan.output.flatMap(a => 
attrToAliases.getOrElse(a, Nil))

Review Comment:
   I think, this change does not address the real issue. The real issue is, 
that Generate contains a list  unrequiredChildIndex of child output indices, 
that are not needed in the Generate output. This list has to be adjusted to fit 
the inserted Project node of NestedColumnAliasing. Here it just fits 
accidentally, because the original list is included completely at the beginning 
of the new Project node. But this may include unnecessary outputs, that 
ColumnPruning is trying to avoid. I have a different proposal, that adjust the 
list of indices to point to the new positions after attribute aliasing: 
https://github.com/apache/spark/pull/49061



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to