[GitHub] [spark] ulysses-you commented on pull request #32328: [SPARK-35214][SQL] OptimizeSkewedJoin support ShuffledHashJoinExec

GitBox Mon, 26 Apr 2021 01:30:20 -0700


ulysses-you commented on pull request #32328:
URL: https://github.com/apache/spark/pull/32328#issuecomment-826626797



   @c21  thanks for the input.
   
   Yes,  we can not optimize skew with all type of joins's build side in AQE. 
But at least, currently we can handle the skew inner like join with both stream 
and build side.
   
   > If that's true, it sounds to me that it may potentially introduce more OOM 
on build side, as tasks are sharing executor's off-heap memory to build hash 
maps
   
   Yes, it's a side effect for `OptimizeSkewedJoin`, but smj's advantage is it 
could spill. IMO, if user specify the shuffled hash join to do execution that 
means they know the benefit and issue of it. And in the other hand, we can 
easily increase the menroy but hard to make skew join fast.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ulysses-you commented on pull request #32328: [SPARK-35214][SQL] OptimizeSkewedJoin support ShuffledHashJoinExec

Reply via email to