wangyum commented on PR #48789: URL: https://github.com/apache/spark/pull/48789#issuecomment-2463635182
1. Remove the project through `RemoveRedundantProjects` in `AdaptiveSparkPlanExec` because `TakeOrderedAndProject` can keep the order: https://github.com/apache/spark/blob/d64b8803e0b74cb827d741e126341c2aa56a4f1e/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala#L146 2. After the first query stage finished. It reOptimize the query through logical plan: https://github.com/apache/spark/blob/d64b8803e0b74cb827d741e126341c2aa56a4f1e/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala#L382, and remove the limit through `EliminateLimits` in `AQEOptimizer`: https://github.com/apache/spark/blob/87b20b166c41d4c265ac54eed75707b7726d371f/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala#L46, and then the new physical plan do not have `TakeOrderedAndProject`. Another fix approach is change 'requireOrdering' from false to true if AQE enabled: https://github.com/apache/spark/blob/d64b8803e0b74cb827d741e126341c2aa56a4f1e/sql/core/src/main/scala/org/apache/spark/sql/execution/RemoveRedundantProjects.scala#L61 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
