GitHub user shenh062326 opened a pull request:
https://github.com/apache/spark/pull/11386
[SPARK-13450][SQL] External spilling when join a lot of rows with the same
key
SortMergeJoin use a ArrayBuffer[InternalRow] to store bufferedMatches, if
the join have a lot of rows with the same key, it will throw OutOfMemoryError.
Add a ExternalAppendOnlyArrayBuffer to store bufferedMatches instand of
ArrayBuffer.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/shenh062326/spark my_change6
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11386.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11386
----
commit b0c00f42c4a889aabb6a0edd25522c53df9f18ad
Author: hongshen <shenh062326>
Date: 2016-02-26T15:29:45Z
External spilling when join a lot of rows with the same key
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]