[GitHub] [spark] c21 commented on a change in pull request #32328: [SPARK-35214][SQL] OptimizeSkewedJoin support ShuffledHashJoinExec

GitBox Tue, 27 Apr 2021 23:26:04 -0700


c21 commented on a change in pull request #32328:
URL: https://github.com/apache/spark/pull/32328#discussion_r621850427




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
##########
@@ -148,7 +148,7 @@ object OptimizeSkewedJoin extends CustomShuffleReaderRule {
   /*
    * This method aim to optimize the skewed join with the following steps:

Review comment:
       My hunch is when the build side can be potentially OOM-ed, it should 
already be considered as skewed. So after AQE skew handling, some of 
potentially OOM-ed build side (inner join only) can be avoided.
   
   However, for queries with other join types, queries not having shuffle 
before join, and queries with run-time hash map being significantly larger than 
partition size, we should have run-time fallback mechanism in shuffled hash 
join itself. This PR and #32210 should be good to have and orthogonal to each 
other.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] c21 commented on a change in pull request #32328: [SPARK-35214][SQL] OptimizeSkewedJoin support ShuffledHashJoinExec

Reply via email to