Liulietong commented on pull request #34602:
URL: https://github.com/apache/spark/pull/34602#issuecomment-972472862


   > > And it will not only work in cases where just 2 tables join, many 
complex combination need to be considered, such as multiple table joins in same 
stage.
   > 
   > Why it can not work in such case ? if multiple table joins in same stage, 
the plan should be :
   > 
   > ```
   > SHJ2
   >   SHJ1
   >     ShuffleStage
   >     ShuffleStage
   > ShuffleStage
   > ```
   > 
   > So we can still optimize the SHJ1 by transformUp this plan if we allow 
introduce extra shuffle.
   > 
   > It seems to me that the check `if (shuffleStages.length == 2)` is 
uncessary. aslo cc @JkSelf
   
   Yes, it will work in cases where multiple table joins in same stage. But I 
don't think it's the best way to optimize MultipleSkewedJoin since extra 
shuffle will be introduced.  In worst cases, N SHJ will introduce (N-1) 
shuffles. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to