GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/16357

    [SPARK-18928][branch-2.0]Check TaskContext.isInterrupted() in FileScanRDD, 
JDBCRDD & UnsafeSorter 

    This is a branch-2.0 backport of #16340; the original description follows:
    
    ## What changes were proposed in this pull request?
    
    In order to respond to task cancellation, Spark tasks must periodically 
check `TaskContext.isInterrupted()`, but this check is missing on a few 
critical read paths used in Spark SQL, including `FileScanRDD`, `JDBCRDD`, and 
UnsafeSorter-based sorts. This can cause interrupted / cancelled tasks to 
continue running and become zombies (as also described in #16189).
    
    This patch aims to fix this problem by adding `TaskContext.isInterrupted()` 
checks to these paths. Note that I could have used `InterruptibleIterator` to 
simply wrap a bunch of iterators but in some cases this would have an adverse 
performance penalty or might not be effective due to certain special uses of 
Iterators in Spark SQL. Instead, I inlined `InterruptibleIterator`-style logic 
into existing iterator subclasses.
    
    ## How was this patch tested?
    
    Tested manually in `spark-shell` with two different reproductions of 
non-cancellable tasks, one involving scans of huge files and another involving 
sort-merge joins that spill to disk. Both causes of zombie tasks are fixed by 
the changes added here.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark 
sql-task-interruption-branch-2.0

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16357.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16357
    
----
commit 66a83704d3b5ad8e4f2c078051ce4e635a94b25f
Author: Josh Rosen <[email protected]>
Date:   2016-12-20T00:19:38Z

    [SPARK-18928] Check TaskContext.isInterrupted() in FileScanRDD, JDBCRDD & 
UnsafeSorter
    
    In order to respond to task cancellation, Spark tasks must periodically 
check `TaskContext.isInterrupted()`, but this check is missing on a few 
critical read paths used in Spark SQL, including `FileScanRDD`, `JDBCRDD`, and 
UnsafeSorter-based sorts. This can cause interrupted / cancelled tasks to 
continue running and become zombies (as also described in #16189).
    
    This patch aims to fix this problem by adding `TaskContext.isInterrupted()` 
checks to these paths. Note that I could have used `InterruptibleIterator` to 
simply wrap a bunch of iterators but in some cases this would have an adverse 
performance penalty or might not be effective due to certain special uses of 
Iterators in Spark SQL. Instead, I inlined `InterruptibleIterator`-style logic 
into existing iterator subclasses.
    
    Tested manually in `spark-shell` with two different reproductions of 
non-cancellable tasks, one involving scans of huge files and another involving 
sort-merge joins that spill to disk. Both causes of zombie tasks are fixed by 
the changes added here.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #16340 from JoshRosen/sql-task-interruption.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to