Github user ueshin commented on the pull request:
https://github.com/apache/spark/pull/283#issuecomment-39421176
@mridulm Thank you for your reply.
There are 2 points I have to mention about memory:
- Before shuffle
If data are sorted, no more memory is needed because no sort operation is
needed, and if not sorted, merge join needs some amount of memory to sort data
in each partition.
- After shuffle
Merge join needs at most the same amount of memory as hash join while
fetching data, but it does not need more memory because it can produce output
immediately from input. Hash join needs some more memory to build a hash table.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---