Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/15986#discussion_r89274309
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -335,31 +337,31 @@ private[spark] class TaskSchedulerImpl(
var reason: Option[ExecutorLossReason] = None
synchronized {
try {
- if (state == TaskState.LOST && taskIdToExecutorId.contains(tid)) {
- // We lost this entire executor, so remember that it's gone
- val execId = taskIdToExecutorId(tid)
-
- if (executorIdToTaskCount.contains(execId)) {
+ taskIdToTaskSetManager.get(tid) match {
+ case Some(taskSet) if state == TaskState.LOST =>
+ // TaskState.LOST is only used by the deprecated Mesos
fine-grained scheduling mode,
+ // where each executor corresponds to a single task, so mark
the executor as failed.
+ val execId = taskIdToExecutorId.getOrElse(tid, throw new
IllegalStateException(
+ "taskIdToTaskSetManager.contains(tid) <=>
taskIdToExecutorId.contains(tid)"))
reason = Some(
SlaveLost(s"Task $tid was lost, so marking the executor as
lost as well."))
removeExecutor(execId, reason.get)
failedExecutor = Some(execId)
- }
- }
- taskIdToTaskSetManager.get(tid) match {
+ taskSet.removeRunningTask(tid)
--- End diff --
Previously, these lines would be executed for the `TaskState.LOST` case by
continuing onwards after the `if (state == TaskState.LOST &&
taskIdToExecutorId.contains(tid))` block finished. The problem with doing that
here is that my changes in `removeExecutor()` will have already removed this
task from `taskIdToTaskSetManager` so you'd get an incorrect `Ignoring update
with state ...` error. By moving the logic from that original `if` case into
this `case Some(...) if ...` case it's easy to keep the shared logic for the
"unknown task id" case while avoiding hitting that case spuriously.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]