Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/11167#discussion_r52653078
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -177,13 +177,15 @@ private[spark] class TaskSetManager(
var emittedTaskSizeWarning = false
- /** Add a task to all the pending-task lists that it should be on. */
+ /**
+ * Add a task to all the pending-task lists that it should be on.
+ * Note that it's okay if we add a task to the same queue twice because
+ * dequeueTaskFromList will skip already-running tasks.
--- End diff --
Can you move your new comment to the comment above pendingTasksForExecutor?
I'd change that comment to say something like:
// Set of pending tasks for each executor. These collections are actually
// treated as stacks, in which new tasks are added to the end of the
// ArrayBuffer and removed from the end. This makes it faster to detect
// tasks that repeatedly fail because whenever a task failed, it is put
// back at the head of the stack. These collections may contain duplicates
// for two reasons:
// (1) : Tasks are only removed lazily; when a task is launched, it remains
in all the pending lists except
// the one that it was launched from.
// (2): Tasks may be re-added to these lists multiple times as a result
// of failures.
// Duplicates are handled in dequeueTaskFromList, which ensures that a
// task hasn't already started running before launching it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]