Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/15986#discussion_r89253541 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -525,7 +525,12 @@ private[spark] class TaskSchedulerImpl( * of any running tasks, since the loss reason defines whether we'll fail those tasks. --- End diff -- Based on the ["tasks are not re-scheduled while executor loss reason is pending" test](https://github.com/apache/spark/blob/072f4c518cdc57d705beec6bcc3113d9a6740819/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala#L268) in `TaskSchedulerImplSuite`, it looks like the API contract here is that if `executorLost` is called with `LossReasonPending` then it will eventually be called with some other reason. This will cause it to [call](https://github.com/apache/spark/pull/15986/files#diff-d4000438827afe3a185ae75b24987a61R550) `rootPool.executorLost()` , which, in turn, will call `executorLost` for all TaskSetManagers, which will perform their own internal executorId to task id mapping to mark tasks as failed and inform the DAGScheduler. The `TaskSetManager` doesn't call back into the `TaskScheduler` to access any of the data in these mappings so I think it's safe to clean them up immediately at the top of `removeExecutor` rather than putting them behind the `r eason != LossReasonPending` check. Note that it's also not as simple as just putting those behind `reason != LossReasonPending` as a defensive measure because then we'd be changing the contract on when `runningTasksByExecutors()` is updated: previously, it would set a failed executor's running task count to zero as soon as the executor failed, whereas it would do it only after the reason was known should we move this update behind that check. I think that these subtleties / distinctions are only relevant to YARN mode, so I'll loop in @vanzin to comment on them.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org