trohwer commented on code in PR #37348:
URL: https://github.com/apache/spark/pull/37348#discussion_r1869519579
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala:
##########
@@ -184,9 +184,20 @@ object NestedColumnAliasing {
plan: LogicalPlan,
nestedFieldToAlias: Map[Expression, Alias],
attrToAliases: AttributeMap[Seq[Alias]]): LogicalPlan = {
- plan.withNewChildren(plan.children.map { plan =>
- Project(plan.output.flatMap(a => attrToAliases.getOrElse(a, Seq(a))),
plan)
- }).transformExpressions {
+ val newChildPlan = plan match {
+ case g: Generate =>
+ g.withNewChildren(g.children.map { childPlan =>
+ val origOutput = childPlan.output
+ val fromAlias = childPlan.output.flatMap(a =>
attrToAliases.getOrElse(a, Nil))
Review Comment:
I think, this change does not address the real issue. The real issue is,
that Generate contains a list unrequiredChildIndex of child output indices,
that are not needed in the Generate output. This list has to be adjusted to fit
the inserted Project node of NestedColumnAliasing. Here it just fits
accidentally, because the original list is included completely at the beginning
of the new Project node. But this may include unnecessary outputs, that
ColumnPruning is trying to avoid. I have a different proposal, that adjust the
list of indices to point to the new positions after attribute aliasing:
https://github.com/apache/spark/pull/49061
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]