GitHub user ericl opened a pull request:
https://github.com/apache/spark/pull/14931
[SPARK-17370] Shuffle service files not invalidated when a slave is lost
## What changes were proposed in this pull request?
DAGScheduler invalidates shuffle files when an executor loss event occurs,
but not when the external shuffle service is enabled. This is because when
shuffle service is on, the shuffle file lifetime can exceed the executor
lifetime.
However, it doesn't invalidate shuffle files when the shuffle service
itself is lost (due to whole slave loss). This can cause long hangs when slaves
are lost since the file loss is not detected until a subsequent stage attempts
to read the shuffle files.
The proposed fix is to also invalidate shuffle files when an executor is
lost due to a `SlaveLost` event.
## How was this patch tested?
Unit tests, also verified on an actual cluster that slave loss invalidates
shuffle files immediately as expected.
cc @mateiz
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ericl/spark sc-4439
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14931.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14931
----
commit 17507fa7a389bc442ac687ed8ddaee8453c11b55
Author: Eric Liang <[email protected]>
Date: 2016-09-02T01:13:02Z
Thu Sep 1 18:13:02 PDT 2016
commit a704376b57b488ce0ce1b0ba8ed13d36e5debfd4
Author: Eric Liang <[email protected]>
Date: 2016-09-02T01:13:12Z
Merge branch 'master' into sc-4439
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]