[ 
https://issues.apache.org/jira/browse/SPARK-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-9552.
------------------------------
          Resolution: Fixed
       Fix Version/s: 1.6.0
    Target Version/s: 1.6.0

> Dynamic allocation kills busy executors on race condition
> ---------------------------------------------------------
>
>                 Key: SPARK-9552
>                 URL: https://issues.apache.org/jira/browse/SPARK-9552
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 1.4.0, 1.4.1
>            Reporter: Jie Huang
>            Assignee: Jie Huang
>             Fix For: 1.6.0
>
>
> By using the dynamic allocation, sometimes it occurs false killing for those 
> busy executors. Some executors with assignments will be killed because of 
> being idle for enough time (say 60 seconds). The root cause is that the 
> Task-Launch listener event is asynchronized.
> For example, some executors are under assigning tasks, but not sending out 
> the listener notification yet. Meanwhile, the dynamic allocation's executor 
> idle time is up (e.g., 60 seconds). It will trigger killExecutor event at the 
> same time.
> the timer expiration starts before the listener event arrives.
> Then, the task is going to run on top of that killed/killing executor. It 
> will lead to task failure finally.
> Here is the proposal to fix it. We can add the force control for 
> killExecutor. If the force control is not set (i.e., false), we'd better to 
> check if the executor under killing is idle or busy. If the current executor 
> has some assignment, we should not kill that executor and return back false 
> (to indicate killing failure). In dynamic allocation, we'd better to turn off 
> force killing (i.e., force = false), we will meet killing failure if tries to 
> kill a busy executor. And then, the executor timer won't be invalid. Later 
> on, the task assignment event arrives, we can remove the idle timer 
> accordingly. So that we can avoid false killing for those busy executors in 
> dynamic allocation.
> For the rest of usages, the end users can decide if to use force killing or 
> not by themselves. If to turn on that option, the killExecutor will do the 
> action without any status checking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to