Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15986#discussion_r89274309
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -335,31 +337,31 @@ private[spark] class TaskSchedulerImpl(
         var reason: Option[ExecutorLossReason] = None
         synchronized {
           try {
    -        if (state == TaskState.LOST && taskIdToExecutorId.contains(tid)) {
    -          // We lost this entire executor, so remember that it's gone
    -          val execId = taskIdToExecutorId(tid)
    -
    -          if (executorIdToTaskCount.contains(execId)) {
    +        taskIdToTaskSetManager.get(tid) match {
    +          case Some(taskSet) if state == TaskState.LOST =>
    +            // TaskState.LOST is only used by the deprecated Mesos 
fine-grained scheduling mode,
    +            // where each executor corresponds to a single task, so mark 
the executor as failed.
    +            val execId = taskIdToExecutorId.getOrElse(tid, throw new 
IllegalStateException(
    +              "taskIdToTaskSetManager.contains(tid) <=> 
taskIdToExecutorId.contains(tid)"))
                 reason = Some(
                   SlaveLost(s"Task $tid was lost, so marking the executor as 
lost as well."))
                 removeExecutor(execId, reason.get)
                 failedExecutor = Some(execId)
    -          }
    -        }
    -        taskIdToTaskSetManager.get(tid) match {
    +            taskSet.removeRunningTask(tid)
    --- End diff --
    
    Previously, these lines would be executed for the `TaskState.LOST` case by 
continuing onwards after the  `if (state == TaskState.LOST && 
taskIdToExecutorId.contains(tid))` block finished. The problem with doing that 
here is that my changes in `removeExecutor()` will have already removed this 
task from `taskIdToTaskSetManager` so you'd get an incorrect `Ignoring update 
with state ...` error. By moving the logic from that original `if` case into 
this `case Some(...) if ...` case it's easy to keep the shared logic for the 
"unknown task id" case while avoiding hitting that case spuriously.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to