cloud-fan commented on pull request #30829:
URL: https://github.com/apache/spark/pull/30829#issuecomment-765497632


   After a second thought, I think adding "unhandled" shuffle is a bit risky, 
but allowing the stage optimization phase to add new shuffles is too 
complicated.
   
   I'd like to revisit the idea of putting the skew join optimization rule in 
the stage preparation phase. For the two points you gave:
   1. I think it's not true now. The comment is stale. If you look at the 
classdoc of `OptimizeSkewedJoin`, it says `when this rule is enabled, it also 
coalesces non-skewed partitions like CoalesceShufflePartitions does.` So I 
don't think `OptimizeSkewedJoin` needs to be run after 
`CoalesceShufflePartitions`.
   2. We can add some checks and only trigger `OptimizeSkewedJoin` if the 
related shuffle stages are all materialized.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to