baibaichen opened a new pull request, #53620:
URL: https://github.com/apache/spark/pull/53620

   ### What changes were proposed in this pull request?
   
   This PR uses `collectFirst` to find the first `AdaptiveSparkPlanExec` node 
anywhere in the plan tree, instead of assuming the root plan is an 
`AdaptiveSparkPlanExec`.
   
   ### Why are the changes needed?
   
   https://github.com/apache/spark/pull/52157 introduced the 
`extractShuffleIds` method in `SQLExecution` to find shuffle IDs of 
`SparkPlan`. Previously, the method implicitly assumed that if AQE is enabled, 
the `AdaptiveSparkPlanExec` would be at the root of the input. Since Spark only 
inserts `AdaptiveSparkPlanExec` under Command, this assumption was fine. 
However, the `AdaptiveSparkPlanExec` may not be the root node in Gluten. Gluten 
needs to insert a special physical plan to do column to row transition.
   
   By using `collectFirst`, we can correctly locate the `AdaptiveSparkPlanExec` 
regardless of its position in the plan tree, which improves compatibility.
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Pass GHA.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to