GitHub user ericl opened a pull request:

    https://github.com/apache/spark/pull/14931

    [SPARK-17370] Shuffle service files not invalidated when a slave is lost

    ## What changes were proposed in this pull request?
    
    DAGScheduler invalidates shuffle files when an executor loss event occurs, 
but not when the external shuffle service is enabled. This is because when 
shuffle service is on, the shuffle file lifetime can exceed the executor 
lifetime.
    
    However, it doesn't invalidate shuffle files when the shuffle service 
itself is lost (due to whole slave loss). This can cause long hangs when slaves 
are lost since the file loss is not detected until a subsequent stage attempts 
to read the shuffle files.
    
    The proposed fix is to also invalidate shuffle files when an executor is 
lost due to a `SlaveLost` event.
    
    ## How was this patch tested?
    
    Unit tests, also verified on an actual cluster that slave loss invalidates 
shuffle files immediately as expected.
    
    cc @mateiz 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ericl/spark sc-4439

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14931.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14931
    
----
commit 17507fa7a389bc442ac687ed8ddaee8453c11b55
Author: Eric Liang <[email protected]>
Date:   2016-09-02T01:13:02Z

    Thu Sep  1 18:13:02 PDT 2016

commit a704376b57b488ce0ce1b0ba8ed13d36e5debfd4
Author: Eric Liang <[email protected]>
Date:   2016-09-02T01:13:12Z

    Merge branch 'master' into sc-4439

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to