GitHub user eyalfa opened a pull request:

    https://github.com/apache/spark/pull/19481

    [SPARK-21907][CORE][BACKPORT 2.2] oom during spill

    back-port #19181 to branch-2.2.
    
    1. a test reproducing 
[SPARK-21907](https://issues.apache.org/jira/browse/SPARK-21907)
    2. a fix for the root cause of the issue.
    
    `org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill` 
calls `org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset` 
which may trigger another spill,
    when this happens the `array` member is already de-allocated but still 
referenced by the code, this causes the nested spill to fail with an NPE in 
`org.apache.spark.memory.TaskMemoryManager.getPage`.
    This patch introduces a reproduction in a test case and a fix, the fix 
simply sets the in-mem sorter's array member to an empty array before actually 
performing the allocation. This prevents the spilling code from 'touching' the 
de-allocated array.
    
    introduced a new test case: 
`org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorterSuite#testOOMDuringSpill`.
    
    Author: Eyal Farago <e...@nrgene.com>
    
    Closes #19181 from eyalfa/SPARK-21907__oom_during_spill.
    
    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/eyalfa/spark 
SPARK-21907__oom_during_spill__BACKPORT-2.2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19481.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19481
    
----
commit 9b6c5d2672a628951a6f35802dd569ea91a544a7
Author: Eyal Farago <e...@nrgene.com>
Date:   2017-10-10T20:49:47Z

    [SPARK-21907][CORE] oom during spill
    
    1. a test reproducing 
[SPARK-21907](https://issues.apache.org/jira/browse/SPARK-21907)
    2. a fix for the root cause of the issue.
    
    `org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill` 
calls `org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset` 
which may trigger another spill,
    when this happens the `array` member is already de-allocated but still 
referenced by the code, this causes the nested spill to fail with an NPE in 
`org.apache.spark.memory.TaskMemoryManager.getPage`.
    This patch introduces a reproduction in a test case and a fix, the fix 
simply sets the in-mem sorter's array member to an empty array before actually 
performing the allocation. This prevents the spilling code from 'touching' the 
de-allocated array.
    
    introduced a new test case: 
`org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorterSuite#testOOMDuringSpill`.
    
    Author: Eyal Farago <e...@nrgene.com>
    
    Closes #19181 from eyalfa/SPARK-21907__oom_during_spill.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to