cloud-fan commented on pull request #32550: URL: https://github.com/apache/spark/pull/32550#issuecomment-848424672
That's a good point. I think this is a general issue: if the planner creates shuffle hash join directly, we may also coalesce partitions and make the local hash map bigger. I don't really have a good idea here. If we do have many small partitions, coalesce partitions should be beneficial even for shuffle hash join. One idea is to use the existing config `advisoryPartitionSizeInBytes` in the new rule, then we don't need to worry about the partition size after coalescing becomes much larger than we expect. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
