Re: [PR] [SPARK-50258][SQL] Keep the output order after AQE optimization [spark]


wangyum commented on PR #48789:
URL: https://github.com/apache/spark/pull/48789#issuecomment-2463635182


   1. Remove the project through `RemoveRedundantProjects` in 
`AdaptiveSparkPlanExec` because `TakeOrderedAndProject` can keep the order: 
https://github.com/apache/spark/blob/d64b8803e0b74cb827d741e126341c2aa56a4f1e/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala#L146
   2. After the first query stage finished. It reOptimize the query through 
logical plan: 
https://github.com/apache/spark/blob/d64b8803e0b74cb827d741e126341c2aa56a4f1e/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala#L382,
 and remove the limit through `EliminateLimits` in `AQEOptimizer`: 
https://github.com/apache/spark/blob/87b20b166c41d4c265ac54eed75707b7726d371f/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala#L46,
 and then the new physical plan do not have `TakeOrderedAndProject`.
   
   
   Another fix approach is change 'requireOrdering' from false to true if AQE 
enabled:
   
https://github.com/apache/spark/blob/d64b8803e0b74cb827d741e126341c2aa56a4f1e/sql/core/src/main/scala/org/apache/spark/sql/execution/RemoveRedundantProjects.scala#L61


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-50258][SQL] Keep the output order after AQE optimization [spark]

Reply via email to