[GitHub] [spark] AngersZhuuuu commented on pull request #29692: [SPARK-32830][SQL] Optimize Skewed BroadcastNestedLoopJoin with AQE

GitBox Sat, 19 Sep 2020 18:43:58 -0700


AngersZhuuuu commented on pull request #29692:
URL: https://github.com/apache/spark/pull/29692#issuecomment-695464072



   > Then we need some estimation work, as the shuffle/scan node may be far 
away from the join node. We also need to carefully justify if the extra shuffle 
cost worths the skew elimination benefits.
   
   Yea, only when skewed very serious and  threshold is reached ,  worth  to 
re-shuffle data. 
   Hope for some advise: in current code, is there any method to estimate 
shuffle cost?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] AngersZhuuuu commented on pull request #29692: [SPARK-32830][SQL] Optimize Skewed BroadcastNestedLoopJoin with AQE

Reply via email to