GitHub user brad-kaiser opened a pull request:

    https://github.com/apache/spark/pull/19836

    [SPARK-22618][CORE] Catch exception in removeRDD to stop jobs from dying

    ## What changes were proposed in this pull request?
    
    I propose that BlockManagerMasterEndpoint.removeRdd() should catch and log 
any IOExceptions it receives. As it is now, the exception can bubble up to the 
main thread and kill user applications when called from RDD.unpersist(). I 
think this change is a better experience for the end user.
    
    I chose to catch the exception in BlockManagerMasterEndpoint.removeRdd() 
instead of RDD.unpersist() because this way the RDD.unpersist() blocking option 
will still work correctly. Otherwise, blocking will get short circuited by the 
first error. 
    
    ## How was this patch tested?
    
    This patch was tested with a job that shows the job killing behavior 
mentioned above. 
    
    @rxin, it looks like you originally wrote this method, I would appreciate 
it if you took a look. Thanks.
    
    This contribution is my original work and is licensed under the project's 
open source license.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/brad-kaiser/spark catch-unpersist-exception

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19836.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19836
    
----
commit 889993f67853555afa3f2413ff64d4e253f3c0ce
Author: Brad Kaiser <[email protected]>
Date:   2017-11-27T19:29:52Z

    [SPARK-22618][CORE] Catch exception in removeRDD to stop jobs from dying

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to