Re: [PR] [SPARK-50258][SQL] Keep the output column order after AQE optimization [spark]

via GitHub Thu, 21 Nov 2024 05:00:26 -0800


cloud-fan commented on code in PR #48789:
URL: https://github.com/apache/spark/pull/48789#discussion_r1848312985



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/RemoveRedundantProjects.scala:
##########
@@ -58,7 +58,8 @@ object RemoveRedundantProjects extends Rule[SparkPlan] {
           p.mapChildren(removeProject(_, false))
         }
       case op: TakeOrderedAndProjectExec =>
-        op.mapChildren(removeProject(_, false))
+        // TakeOrderedAndProjectExec requires keep column ordering if AQE is 
enabled.

Review Comment:
   Let's be more specific about it. AQE is not the direct reason. The root 
cause is the planner turns `Limit` + `Sort` into `TakeOrderedAndProjectExec` 
which adds an additional projection that does not exist in the logical plan. We 
shouldn't use this additional projection to optimize out other projections, 
otherwise when AQE turns physical plan back to logical plan, we lose the 
`Project` and may mess up the output column order.
   
   So the condition shouldn't be `conf.adaptiveExecutionEnabled`, but 
`op.projectList == op.child.output`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-50258][SQL] Keep the output column order after AQE optimization [spark]

Reply via email to