GitHub user GraceH opened a pull request:
https://github.com/apache/spark/pull/7888
Add force control for killExecutors to avoid false killing for those busy
executors
By using the dynamic allocation, sometimes it occurs false killing for
those busy executors. Some executors with assignments will be killed because of
being idle for enough time (say 60 seconds). The root cause is that the
Task-Launch listener event is asynchronized.
For example, some executors are under assigning tasks, but not sending out
the listener notification yet. Meanwhile, the dynamic allocation's executor
idle time is up (e.g., 60 seconds). It will trigger killExecutor event at the
same time.
1. the timer expiration starts before the listener event arrives.
2. Then, the task is going to run on top of that killed/killing executor.
It will lead to task failure finally.
Here is the proposal to fix it. We can add the force control for
killExecutor. If the force control is not set (i.e., false), we'd better to
check if the executor under killing is idle or busy. If the current executor
has some assignment, we should not kill that executor and return back false (to
indicate killing failure). In dynamic allocation, we'd better to turn off force
killing (i.e., force = false), we will meet killing failure if tries to kill a
busy executor. And then, the executor timer won't be invalid. Later on, the
task assignment event arrives, we can remove the idle timer accordingly. So
that we can avoid false killing for those busy executors in dynamic allocation.
For the rest of usages, the end users can decide if to use force killing or
not by themselves. If to turn on that option, the killExecutor will do the
action without any status checking.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/GraceH/spark forcekill
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/7888.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #7888
----
commit 4acbd79a2934126c045ce6c4a8f9133dac4c062a
Author: Grace <[email protected]>
Date: 2015-08-03T06:20:09Z
Add force control for killExecutors
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]