GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/13473
[SPARK-15736][CORE] Gracefully handle loss of DiskStore files
If an RDD partition is cached on disk and the DiskStore file is lost, then
reads of that cached partition will fail and the missing partition is supposed
to be recomputed by a new task attempt. However, the current behavior is to
repeatedly re-attempt the read on the same machine without performing any
recomputation, which leads to a complete job failure.
In order to fix this problem, the executor with the missing file needs to
properly mark the corresponding block as missing so that it stops advertising
itself as a cache location for that block.
This patch fixes this bug and adds an end-to-end regression test (in
`FailureSuite`) and a set of unit tests (`in BlockManagerSuite`).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark handle-missing-cache-files
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/13473.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #13473
----
commit 92bfce821c998c0b1a5b7ab3b674d0e59a7c0033
Author: Josh Rosen <[email protected]>
Date: 2016-06-02T20:30:42Z
Add failing regression test.
commit a104ac534be3df927a21aeb42fe6c11e1cd03636
Author: Josh Rosen <[email protected]>
Date: 2016-06-02T21:08:43Z
Add failing unit tests in BlockManagerSuite.
commit e26b2f6aa0571392730338960666ca66f5901f35
Author: Josh Rosen <[email protected]>
Date: 2016-06-02T21:09:02Z
Fix actual bug.
commit 0c51bb63e979549f2fad4fc14767ac6a19952e8c
Author: Josh Rosen <[email protected]>
Date: 2016-06-02T21:21:22Z
Update test names.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]