GitHub user GraceH opened a pull request:

    https://github.com/apache/spark/pull/7888

    Add force control for killExecutors to avoid false killing for those busy 
executors

    By using the dynamic allocation, sometimes it occurs false killing for 
those busy executors. Some executors with assignments will be killed because of 
being idle for enough time (say 60 seconds). The root cause is that the 
Task-Launch listener event is asynchronized.  
    
    For example, some executors are under assigning tasks, but not sending out 
the listener notification yet. Meanwhile, the dynamic allocation's executor 
idle time is up (e.g., 60 seconds). It will trigger killExecutor event at the 
same time. 
     1. the timer expiration starts before the listener event arrives.
     2. Then, the task is going to run on top of that killed/killing executor. 
It will lead to task failure finally. 
    
    Here is the proposal to fix it. We can add the force control for 
killExecutor. If the force control is not set (i.e., false), we'd better to 
check if the executor under killing is idle or busy. If the current executor 
has some assignment, we should not kill that executor and return back false (to 
indicate killing failure). In dynamic allocation, we'd better to turn off force 
killing (i.e., force = false), we will meet killing failure if tries to kill a 
busy executor. And then, the executor timer won't be invalid. Later on, the 
task assignment event arrives, we can remove the idle timer accordingly. So 
that we can avoid false killing for those busy executors in dynamic allocation. 
    
    For the rest of usages, the end users can decide if to use force killing or 
not by themselves.  If to turn on that option, the killExecutor will do the 
action without any status checking.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/GraceH/spark forcekill

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/7888.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #7888
    
----
commit 4acbd79a2934126c045ce6c4a8f9133dac4c062a
Author: Grace <[email protected]>
Date:   2015-08-03T06:20:09Z

    Add force control for killExecutors

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to