GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/18251

    [SPARK-21033][SQL] fix the potential OOM in UnsafeExternalSorter

    ## What changes were proposed in this pull request?
    
    In `UnsafeInMemorySorter`, one record may take 32 bytes: 1 `long` for 
pointer, 1 `long` for key-prefix, and another 2 `long`s as the temporary buffer 
for radix sort.
    
    In `UnsafeExternalSorter`, we set the 
`DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD` to be `1024 * 1024 * 1024 / 2`, and 
hoping the max size of point array to be 8 GB. However this is wrong, `1024 * 
1024 * 1024 / 2 * 32` is actually 16 GB, and if we grow the point array before 
reach this limitation, we may hit the max-page-size error.
    
    This PR fixes this by making `DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD` 2 
times smaller, and adding a safe check in 
`UnsafeExternalSorter.growPointerArrayIfNecessary` to avoid allocating a page 
larger than max page size.
    
    ## How was this patch tested?
    
    TODO

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark sort

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18251.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18251
    
----
commit 93550f44397e79288fc94fb8bad60d271db5da58
Author: Wenchen Fan <[email protected]>
Date:   2017-06-09T08:39:28Z

    fix the potential OOM in UnsafeExternalSorter

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to