GitHub user lirui-apache opened a pull request:

    https://github.com/apache/spark/pull/21486

    [SPARK-24387][Core] Heartbeat-timeout executor is added back and used again

    ## What changes were proposed in this pull request?
    
    When an executor's heartbeat is lost, we call scheduler.executorLost before 
we tell the backend to kill the executor. TaskSchedulerImpl asks the backend to 
revive offers in executorLost. If this is the only executor, it's possible the 
backend will offer it again to TaskSchedulerImpl, and the retried task is 
scheduled to this executor.
    
    This patch proposes to call scheduler.executorLost after the executor is 
killed. At this point, the executor has been marked as pending-to-remove and 
won't be offered again.
    
    ## How was this patch tested?
    
    Added a new test case in HeartbeatReceiverSuite. W/o the fix this test case 
fails.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lirui-apache/spark SPARK-24387

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21486.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21486
    
----
commit 189f2696dab47a23b3f2a48a313a72dc4ec77c80
Author: Rui Li <lirui@...>
Date:   2018-06-02T08:25:10Z

    Call executorLost after the executor is killed

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to