cloud-fan commented on pull request #32210: URL: https://github.com/apache/spark/pull/32210#issuecomment-826239123
After more thinking, I'm wondering if this is the right direction to go. Apparently falling back to SMJ wastes the partially-built hash map. If one partition is a bit larger to build the in-memory hash map, I feel spilling the hash map might be a better choice? If one partition is much larger to build the in-memory hash map, seems we can use the same technique of skew join handling, to split the partition into multiple smaller ones so that they can fit in memory. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
