Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/15986#discussion_r89257822
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -339,7 +341,7 @@ private[spark] class TaskSchedulerImpl(
// We lost this entire executor, so remember that it's gone
val execId = taskIdToExecutorId(tid)
- if (executorIdToTaskCount.contains(execId)) {
+ if (executorIdToRunningTaskIds.contains(execId)) {
reason = Some(
SlaveLost(s"Task $tid was lost, so marking the executor as
lost as well."))
removeExecutor(execId, reason.get)
--- End diff --
To look at this another way, note that `TaskSchedulerImpl` is only called
from three places:
- LocalSchedulerBackend, which won't use `TaskState.LOST`
- MesosFineGrainedSchedulerBackend, where `TaskState.LOST` means the total
loss of an executor that corresponded to a single task (due to fine-grained
mode)
- CoarseGrainedSchedulerBackend, where this is only called with a state
that comes from a `StatusUpdate` message sent by an executor. This task state
will never be `TaskState.LOST`.
Given all of this, I think that the right course of action here is to
update the comments to clarify that `TaskState.LOST` is only relevant to
fine-grained Mesos mode and to refactor this block to call
```
taskSet.removeRunningTask(tid)
taskResultGetter.enqueueFailedTask(taskSet, tid, state,
serializedData)
```
after calling `removeExecutor` for the fine-grained task, then skipping the
rest of the logic which only applies to local mode or coarse-grained schedulers.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]