Github user gpang commented on a diff in the pull request:
https://github.com/apache/spark/pull/18098#discussion_r120231509
--- Diff:
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
---
@@ -502,6 +521,25 @@ private[spark] class
MesosCoarseGrainedSchedulerBackend(
)
}
+ private def satisfiesLocality(offerHostname: String): Boolean = {
+ if (hostToLocalTaskCount.nonEmpty) {
--- End diff --
@mgummelt Thanks for the thoughtful response. Sorry for the delay.
I am not entirely sure how multi-stage jobs would work, but in the current
PR, after all the executors are started for a stage, the delay timeout resets
for the next "stage". So, if Spark needs 3 executors, and 3 executors
eventually start, the next time Spark needs more executors, the delay timeout
would start fresh. However, if the next stage is requested before the previous
stage is fully allocated, then the scenario you described happens. I had made
the assumption that stages would be fully allocated before requesting
additional executors for the next stage. Do you have any insights into how
executors in stages are allocated?
I will also look into per-host delay timeouts.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]