Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15986#discussion_r89415621
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -525,7 +525,12 @@ private[spark] class TaskSchedulerImpl(
        * of any running tasks, since the loss reason defines whether we'll 
fail those tasks.
    --- End diff --
    
    Josh, my understanding is consistent with yours -- that if executorLost() 
is called with LossReasonPending, it will eventually be called again with a 
real loss reason.  To elaborate on what Marcelo said, executorIdToHost is used 
at the "gatekeeper" for this: on [this 
line](https://github.com/JoshRosen/spark/blob/078ac0e6321aeb72c670a65ec90b9c20ef0a7788/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L487),
 we use the existence of an entry in executorIdToHost to determine whether we 
need to call removeExecutor (possibly for a 2nd time).
    
    My only concern with the current approach is the following: the task 
doesn't get marked as finished in the task set manager until 
rootPool.executorLost gets called (i.e., until the 2nd removeExecutor call).  
Is it possible for anything to go awry for task updates that happen between the 
first removeExecutor call (at which point the TaskSchedulerImpl's state for the 
task will be removed) and when the task is marked as failed and removed from 
the TSM?  I looked over this and I think things should be fine -- we'll just 
drop the update -- but can you update the error message we log on 369 (since 
now it can happen when an update happens after an executor has been lost, in 
addition to when duplicate updates are received)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to