wuyi created SPARK-23888: ---------------------------- Summary: speculative task should not run on a given host where another attempt is already running on Key: SPARK-23888 URL: https://issues.apache.org/jira/browse/SPARK-23888 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.0 Reporter: wuyi Fix For: 2.3.0
There's a bug in: {code:java} /** Check whether a task is currently running an attempt on a given host */ private def hasAttemptOnHost(taskIndex: Int, host: String): Boolean = { taskAttempts(taskIndex).exists(_.host == host) } {code} This will ignore hosts which have finished attempts, so we should check whether the attempt is currently running on the given host. And it is possible for a speculative task to run on a host where another attempt failed here before. Assume we have only two machines: host1, host2. We first run task0.0 on host1. Then, due to a long time waiting for task0.0, we launch a speculative task0.1 on host2. And, task0.1 finally failed on host1, but it can not re-run since there's already a copy running on host2. After another long time, we launch a new speculative task0.2. And, now, we can run task0.2 on host1 again, since there's no more running attempt on host1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org