Liulietong commented on pull request #34602: URL: https://github.com/apache/spark/pull/34602#issuecomment-972472862
> > And it will not only work in cases where just 2 tables join, many complex combination need to be considered, such as multiple table joins in same stage. > > Why it can not work in such case ? if multiple table joins in same stage, the plan should be : > > ``` > SHJ2 > SHJ1 > ShuffleStage > ShuffleStage > ShuffleStage > ``` > > So we can still optimize the SHJ1 by transformUp this plan if we allow introduce extra shuffle. > > It seems to me that the check `if (shuffleStages.length == 2)` is uncessary. aslo cc @JkSelf Yes, it will work in cases where multiple table joins in same stage. But I don't think it's the best way to optimize MultipleSkewedJoin since extra shuffle will be introduced. In worst cases, N SHJ will introduce (N-1) shuffles. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
