GitHub user davies opened a pull request:
https://github.com/apache/spark/pull/13318
[SPARK-15391] [SQL] manage the temporary memory of timsort
## What changes were proposed in this pull request?
Currently, the memory for temporary buffer used by TimSort is always
allocated as on-heap without bookkeeping, it could cause OOM both in on-heap
and off-heap mode.
This PR will try to manage that by preallocate it together with the pointer
array, same with RadixSort. It both works for on-heap and off-heap mode.
This PR also change the loadFactor of BytesToBytesMap to 0.5 (it was 0.75),
it enables use to radix sort also makes sure that we have enough memory for
timsort.
## How was this patch tested?
Existing tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/davies/spark fix_timsort
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/13318.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #13318
----
commit 6d074f6e3ad41f427e6dcb9f5a72674798a40b5e
Author: Davies Liu <[email protected]>
Date: 2016-05-26T06:29:09Z
manage the temporary memory of timsort
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]