Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/15249#discussion_r81867147
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -766,9 +782,11 @@ private[spark] class TaskSetManager(
logWarning(failureReason)
None
}
- // always add to failed executors
- failedExecutors.getOrElseUpdate(index, new HashMap[String, Long]()).
- put(info.executorId, clock.getTimeMillis())
+
+ if (reason.countTowardsTaskFailures) {
--- End diff --
the question about moving inside the if on 801 is a good one. I guess I
left the logic here to be more like it was before. Honestly I find `isZombie`
to be super-confusing, I always have to think hard about exactly what it means.
I guess your point is, if the taskset is a zombie, we're not going to schedule
anymore tasks from it, so no point in tracking more failures? And furthermore,
you can't "un-zombie" a taskset?
The only case I can think of the logic actually being different is if you
have speculative execution, and you get task failures after the taskset
completes (not from taskcommitdenied, but real task failures). It doesn't
matter at all for this PR, but it might matter in the context of app-level
blacklisting, if you want to learn about bad executors. eg., maybe an executor
first has stragglers, and then eventually fails things. The current code I
have for app-level blacklisting wouldn't take advantage of this anyway, since
it just gets blacklisting info from a taskset when it succeeds and thats it.
anyway, for now I will go ahead and put it inside that block, but just
wanted to share my thoughts since its a little subtle and wanted to talk it
through.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]