AngersZhuuuu commented on pull request #29692: URL: https://github.com/apache/spark/pull/29692#issuecomment-695464072
> Then we need some estimation work, as the shuffle/scan node may be far away from the join node. We also need to carefully justify if the extra shuffle cost worths the skew elimination benefits. Yea, only when skewed very serious and threshold is reached , worth to re-shuffle data. Hope for some advise: in current code, is there any method to estimate shuffle cost? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
