Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15249#discussion_r81867147
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -766,9 +782,11 @@ private[spark] class TaskSetManager(
             logWarning(failureReason)
             None
         }
    -    // always add to failed executors
    -    failedExecutors.getOrElseUpdate(index, new HashMap[String, Long]()).
    -      put(info.executorId, clock.getTimeMillis())
    +
    +    if (reason.countTowardsTaskFailures) {
    --- End diff --
    
    the question about moving inside the if on 801 is a good one.  I guess I 
left the logic here to be more like it was before.  Honestly I find `isZombie` 
to be super-confusing, I always have to think hard about exactly what it means. 
 I guess your point is, if the taskset is a zombie, we're not going to schedule 
anymore tasks from it, so no point in tracking more failures?  And furthermore, 
you can't "un-zombie" a taskset?
    
    The only case I can think of the logic actually being different is if you 
have speculative execution, and you get task failures after the taskset 
completes (not from taskcommitdenied, but real task failures).  It doesn't 
matter at all for this PR, but it might matter in the context of app-level 
blacklisting, if you want to learn about bad executors.  eg., maybe an executor 
first has stragglers, and then eventually fails things.  The current code I 
have for app-level blacklisting wouldn't take advantage of this anyway, since 
it just gets blacklisting info from a taskset when it succeeds and thats it.
    
    anyway, for now I will go ahead and put it inside that block, but just 
wanted to share my thoughts since its a little subtle and wanted to talk it 
through.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to