[GitHub] [spark] zhengruifeng commented on a change in pull request #32328: [SPARK-35214][SQL] OptimizeSkewedJoin support ShuffledHashJoinExec

GitBox Tue, 27 Apr 2021 23:08:33 -0700


zhengruifeng commented on a change in pull request #32328:
URL: https://github.com/apache/spark/pull/32328#discussion_r621841712




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
##########
@@ -148,7 +148,7 @@ object OptimizeSkewedJoin extends CustomShuffleReaderRule {
   /*
    * This method aim to optimize the skewed join with the following steps:

Review comment:
       Sorry my comment was misleading, the `fallback to smj` there means 
https://github.com/apache/spark/pull/32210
   
   My thought is that we may take potential OOM into account here,
   let build side A inner join stream side B, for a build partition A_0:
   
   - if A_0 is skewed but is less than a OOM threshold, we may not split it;
   - if A_0 is not skewed but greater than that threshold, we may still need to 
split it;
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhengruifeng commented on a change in pull request #32328: [SPARK-35214][SQL] OptimizeSkewedJoin support ShuffledHashJoinExec

Reply via email to