Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/15986#discussion_r89420426
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -335,31 +337,31 @@ private[spark] class TaskSchedulerImpl(
var reason: Option[ExecutorLossReason] = None
synchronized {
try {
- if (state == TaskState.LOST && taskIdToExecutorId.contains(tid)) {
- // We lost this entire executor, so remember that it's gone
- val execId = taskIdToExecutorId(tid)
-
- if (executorIdToTaskCount.contains(execId)) {
+ taskIdToTaskSetManager.get(tid) match {
+ case Some(taskSet) if state == TaskState.LOST =>
+ // TaskState.LOST is only used by the deprecated Mesos
fine-grained scheduling mode,
+ // where each executor corresponds to a single task, so mark
the executor as failed.
+ val execId = taskIdToExecutorId.getOrElse(tid, throw new
IllegalStateException(
+ "taskIdToTaskSetManager.contains(tid) <=>
taskIdToExecutorId.contains(tid)"))
reason = Some(
SlaveLost(s"Task $tid was lost, so marking the executor as
lost as well."))
removeExecutor(execId, reason.get)
--- End diff --
One bit of trickiness is the comment after the `try` block:
```
// Update the DAGScheduler without holding a lock on this, since that
can deadlock
if (failedExecutor.isDefined) {
...
```
I assume this comment is referring to the `synchronized` surrounding the
`try`, so I don't think that we'll be able to simplify this much while still
preserving this contract. Therefore, ugly as this is, I'd prefer to keep this
weird structure for now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]