wuyi created SPARK-23888:
----------------------------

             Summary: speculative task should not run on a given host where 
another attempt is already running on
                 Key: SPARK-23888
                 URL: https://issues.apache.org/jira/browse/SPARK-23888
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.3.0
            Reporter: wuyi
             Fix For: 2.3.0


There's a bug in:
{code:java}
/** Check whether a task is currently running an attempt on a given host */
 private def hasAttemptOnHost(taskIndex: Int, host: String): Boolean = {
   taskAttempts(taskIndex).exists(_.host == host)
 }
{code}
This will ignore hosts which have finished attempts, so we should check whether 
the attempt is currently running on the given host. 

And it is possible for a speculative task to run on a host where another 
attempt failed here before.

Assume we have only two machines: host1, host2.  We first run task0.0 on host1. 
Then, due to  a long time waiting for task0.0, we launch a speculative task0.1 
on host2. And, task0.1 finally failed on host1, but it can not re-run since 
there's already  a copy running on host2. After another long time, we launch a 
new  speculative task0.2. And, now, we can run task0.2 on host1 again, since 
there's no more running attempt on host1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to