Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15986#discussion_r89253541
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -525,7 +525,12 @@ private[spark] class TaskSchedulerImpl(
        * of any running tasks, since the loss reason defines whether we'll 
fail those tasks.
    --- End diff --
    
    Based on the ["tasks are not re-scheduled while executor loss reason is 
pending" 
test](https://github.com/apache/spark/blob/072f4c518cdc57d705beec6bcc3113d9a6740819/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala#L268)
 in `TaskSchedulerImplSuite`, it looks like the API contract here is that if 
`executorLost` is called with `LossReasonPending` then it will eventually be 
called with some other reason. This will cause it to 
[call](https://github.com/apache/spark/pull/15986/files#diff-d4000438827afe3a185ae75b24987a61R550)
 `rootPool.executorLost()` , which, in turn, will call `executorLost` for all 
TaskSetManagers, which will perform their own internal executorId to task id 
mapping to mark tasks as failed and inform the DAGScheduler. The 
`TaskSetManager` doesn't call back into the `TaskScheduler` to access any of 
the data in these mappings so I think it's safe to clean them up immediately at 
the top of `removeExecutor` rather than putting them behind the `r
 eason != LossReasonPending` check.
    
    Note that it's also not as simple as just putting those behind `reason != 
LossReasonPending` as a defensive measure because then we'd be changing the 
contract on when `runningTasksByExecutors()` is updated: previously, it would 
set a failed executor's running task count to zero as soon as the executor 
failed, whereas it would do it only after the reason was known should we move 
this update behind that check.
    
    I think that these subtleties / distinctions are only relevant to YARN 
mode, so I'll loop in @vanzin to comment on them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to