cloud-fan commented on pull request #31653:
URL: https://github.com/apache/spark/pull/31653#issuecomment-795049269


   This is more complicated than I thought, let's discuss the expected workflow 
first. The current AQE workflow is:
   
   1. input physical plan ->
   2. prepared plan (run preparation rules) ->
   3. physical plan with leaf query stages created ->
   4. physical plan with leaf query stages optimized ->
   5. go back to logical plan ->
   6. optimized logical plan ->
   7. run planner to get physical plan ->
   8. repeat from step 2 (with cost model to decide whether to use the 
re-optimized plan or not)
   
   The general idea is to move the skew join optimization rule from step 4 to 
step 2, and allow it to introduce extra shuffles. I feel it's too late to check 
extra shuffles in step 8. Can we do it inside the skew join optimization rule? 
e.g. we run this rule after `EnsureRequirements`, then inside the rule we run 
`EnsureRequirements` again to check extra shuffles. What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to