[GitHub] [spark] cloud-fan commented on pull request #32210: [SPARK-32634][SQL] Introduce sort-based fallback for shuffled hash join (non-code-gen path)

GitBox Sun, 25 Apr 2021 23:38:24 -0700


cloud-fan commented on pull request #32210:
URL: https://github.com/apache/spark/pull/32210#issuecomment-826239123



   After more thinking, I'm wondering if this is the right direction to go. 
Apparently falling back to SMJ wastes the partially-built hash map.
   
   If one partition is a bit larger to build the in-memory hash map, I feel 
spilling the hash map might be a better choice? If one partition is much larger 
to build the in-memory hash map, seems we can use the same technique of skew 
join handling, to split the partition into multiple smaller ones so that they 
can fit in memory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on pull request #32210: [SPARK-32634][SQL] Introduce sort-based fallback for shuffled hash join (non-code-gen path)

Reply via email to