GitHub user Ngone51 opened a pull request:

    [SPARK-23888][CORE] speculative task should not run on a given host where 
another attempt is already running on

    ## What changes were proposed in this pull request?
    There's a bug in:
    /** Check whether a task is currently running an attempt on a given host */
     private def hasAttemptOnHost(taskIndex: Int, host: String): Boolean = {
       taskAttempts(taskIndex).exists( == host)
    This will ignore hosts which only have finished attempts, so we should 
check whether the attempt is currently running on the given host. 
    And it is possible for a speculative task to run on a host where another 
attempt failed here before.
    Assume we have only two machines: host1, host2.  We first run task0.0 on 
host1. Then, due to  a long time waiting for task0.0, we launch a speculative 
task0.1 on host2. And, task0.1 finally failed on host1, but it can not re-run 
since there's already  a copy running on host2. After another long time 
waiting, we launch a new  speculative task0.2. And, now, we can run task0.2 on 
host1 again, since there's no more running attempt on host1.
    ## How was this patch tested?

You can merge this pull request into a Git repository by running:

    $ git pull SPARK-23888

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20998
commit 3584a09410d72207d8a00dcbf385b2e276925fc8
Author: wuyi <ngone_5451@...>
Date:   2018-04-06T15:34:18Z

    add task attempt running check in hasAttemptOnHost



To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to