Github user mgummelt commented on a diff in the pull request:
https://github.com/apache/spark/pull/18098#discussion_r118615339
--- Diff:
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
---
@@ -502,6 +521,25 @@ private[spark] class
MesosCoarseGrainedSchedulerBackend(
)
}
+ private def satisfiesLocality(offerHostname: String): Boolean = {
+ if (hostToLocalTaskCount.nonEmpty) {
--- End diff --
Now that I'm thinking more about this, it's a bit tricker than I originally
thought.
The problem with the current PR is that, while it works fine if all tasks
are submitted at once, it won't have any benefit for tasks submitted after the
scheduler starts, which is what happens in multi-stage jobs. Since you're
measuring the delay timeout from the time the scheduling starts, then we'll
accept all offers that come after that delay, regardless of locality. So if
the timeout is 3s, and the scheduler receives a request to schedule a task 5s
after the job starts, then it will launch an executor to run that task on
whatever offer happens to arrive first.
What we *really* want is, for each task submitted to the `TaskScheduler`,
we want to start counting the delay timeout from the moment the task was
submitted. However, the `ExecutorAllocationClient` interface doesn't provide
us with task IDs. It only gives us hostnames via `hostsToLocalTaskCount`.
I think we can approximate the behavior we want by storing a per-host
timestamp representing the last time its entry in `hostsToLocalTaskCount` was
modified. Then we only accept an offer for a different host if enough time has
elapsed past this stored time.
There's code in `YarnAllocator.scala` that does similar things that might
theoretically provide inspiration, but I just looked and found it to be quite
unreadable.
We should also only do this if dynamic allocation is enabled, so we should
probably just condition the entire check on `Utils.isDynamicAllocationEnabled`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]