GitHub user zsxwing opened a pull request:

    https://github.com/apache/spark/pull/22816

    [SPARK-25822][PySpark]Fix a race condition when releasing a Python worker

    ## What changes were proposed in this pull request?
    
    There is a race condition when releasing a Python worker. If 
`ReaderIterator.handleEndOfDataSection` is not running in the task thread, when 
a task is early terminated (such as `take(N)`), the task completion listener 
may close the worker but "handleEndOfDataSection" can still put the worker into 
the worker pool to reuse.
    
    
https://github.com/zsxwing/spark/commit/0e07b483d2e7c68f3b5c3c118d0bf58c501041b7
 is a patch to reproduce this issue.
    
    I also found a user reported this in the mail list: 
http://mail-archives.apache.org/mod_mbox/spark-user/201610.mbox/%3CCAAUq=h+yluepd23nwvq13ms5hostkhx3ao4f4zqv6sgo5zm...@mail.gmail.com%3E
    
    This PR fixes the issue by using `compareAndSet` to make sure we will never 
return a closed worker to the work pool.
    
    ## How was this patch tested?
    
    Jenkins.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zsxwing/spark fix-socket-closed

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22816.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22816
    
----
commit a22e38917b4f2893ce0d72febed4df1d3eb9fdd5
Author: Shixiong Zhu <zsxwing@...>
Date:   2018-10-24T07:51:26Z

    Fix a race condition when releasing a Python worker

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to