GitHub user GraceH opened a pull request:
https://github.com/apache/spark/pull/7528
Avoid assigning tasks to "lost" executor(s)
Now, when some executors are killed by dynamic-allocation, it leads to some
mis-assignment onto lost executors sometimes. Such kind of mis-assignment
causes task failure(s) or even job failure if it repeats that errors for 4
times.
The root cause is that ***killExecutors*** doesn't remove those executors
under killing ASAP. It depends on the ***OnDisassociated*** event to refresh
the active working list later. The delay time really depends on your cluster
status (from several milliseconds to sub-minute). When new tasks to be
scheduled during that period of time, it will be assigned to those "active" but
"under killing" executors. Then the tasks will be failed due to "executor
lost". The better way is to exclude those executors under killing in the
makeOffers(). Then all those tasks won't be allocated onto those executors "to
be lost" any more.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/GraceH/spark AssignToLostExecutor
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/7528.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #7528
----
commit 30a9ad07a495c937822a1445f0b3488d4e8e6f63
Author: Grace <[email protected]>
Date: 2015-07-20T06:25:47Z
Avoid assigning tasks to lost executors
commit b5546ce45f998ded44513cb066384535e10b47a0
Author: Grace <[email protected]>
Date: 2015-07-20T06:48:19Z
Add comments about the fix
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]