c21 commented on pull request #32328:
URL: https://github.com/apache/spark/pull/32328#issuecomment-827072227


   > Yes, it's a side effect for OptimizeSkewedJoin, but smj's advantage is it 
could spill. IMO, if user specify the shuffled hash join to do execution that 
means they know the benefit and issue of it. And in the other hand, we can 
easily increase the memory but hard to make skew join fast. So this 
optimization can be the extra choice for user.
   
   I agree this adds extra choice for user given current status of thing. But 
in the long-term, we would like to work towards enabling shuffled hash join by 
default (i.e. `spark.sql.join.preferSortMergeJoin`=false). This seems to me add 
more [risk](https://github.com/apache/spark/pull/32328#issuecomment-826601325) 
to the long term direction. So I think we should be more cautious with it and 
have more discussion. cc @cloud-fan.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to