GitHub user adrian-wang opened a pull request:
https://github.com/apache/spark/pull/5717
[SPARK-7165] [SQL] use sort merge join for outer join
This is an extended version of #5208
In this patch, we are introducing sort merge join for not only inner joins,
but left outer/ right outer/ full outer joins.
Using sort merge join could resolve the OOM which is quite common as the
memory easily becomes too small for joins of large tables.
Test cases are always available in SortMergeCompatibilitySuite. And we need
to add some more in `JoinSuite` to test the Join selection.
Also , This patch would benefit from #3438 quite a lot.
/cc @chenghao-intel
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/adrian-wang/spark outersmj
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/5717.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5717
----
commit fc862f421b5cdbac18535fa09a2af668a5fc74d9
Author: Daoyuan Wang <[email protected]>
Date: 2015-04-27T09:40:55Z
use sort merge join for outer join
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]