Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/16376#discussion_r94835320
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -159,7 +159,12 @@ private[spark] class TaskSetManager(
addPendingTask(i)
}
- // Figure out which locality levels we have in our TaskSet, so we can do
delay scheduling
+ /**
+ * Track the set of locality levels which are valid given the tasks
locality preferences and
+ * the set of currently available executors. This is updated as
executors are added and removed.
+ * This allows a performance optimization, of skipping levels that
aren't relevant (eg., skip
+ * PROCESS_LOCAL if no tasks could be run PROCESS_LOCAL for the current
set of executors).
+ */
var myLocalityLevels = computeValidLocalityLevels()
--- End diff --
As I was figuring out the purpose of this for what to put in the comment, I
made a couple of observations:
1) For each executor we add or remove, its an O(numExecutors) operation to
update the locality levels. So overall its an O(numExecutors^2) to add a
bunch. Minor on small clusters, but I wonder if this is an issue when you're
using dynamic allocation and going up and down to 1000s of executors. Its all
happening with a lock on the `TaskSchedulerImpl` too.
2) Though we recompute valid locality levels as executors come and go, we
do *not* as tasks complete. That's not a problem -- as offers come in, we
still go through the right task lists. But it does make me wonder whether this
business of updating the locality levels for the current set of executors is
useful, and instead we should just always use all levels.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]