GitHub user cenyuhai opened a pull request:
https://github.com/apache/spark/pull/15041
[SPARK-17488][CORE] TakeAndOrder will OOM when the data is very large
## What changes were proposed in this pull request?
In function Utils.takeOrdered, it will sort all data in memory, when the
data is very large, It will OOM. This pr is to add external sorter for function
takeOrdered.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cenyuhai/spark SPARK-17488
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15041.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15041
----
commit 869eaaf23f79eefbc6a8ff7a7b9efbc4a9f8c6b7
Author: å²çæµ· <[email protected]>
Date: 2016-08-21T03:55:04Z
Merge pull request #8 from apache/master
merge latest code to my fork
commit b6b0d0a41c1aa59bc97a0aa438619d903b78b108
Author: å²çæµ· <[email protected]>
Date: 2016-09-06T03:03:08Z
Merge pull request #9 from apache/master
Merge latest code to my fork
commit abd7924eab25b6dfdfd78c23a78dadcb3b9fbe1e
Author: å²çæµ· <[email protected]>
Date: 2016-09-08T17:10:12Z
Merge pull request #10 from apache/master
Merge latest code to my fork
commit 07ad91b02ad2e644788a7e432472e8c5384a29c6
Author: cenyuhai <[email protected]>
Date: 2016-09-10T05:17:49Z
add exterlnal sorter for takeOrdered function
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]