Github user gpang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18098#discussion_r120231509
  
    --- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
    @@ -502,6 +521,25 @@ private[spark] class 
MesosCoarseGrainedSchedulerBackend(
         )
       }
     
    +  private def satisfiesLocality(offerHostname: String): Boolean = {
    +    if (hostToLocalTaskCount.nonEmpty) {
    --- End diff --
    
    @mgummelt Thanks for the thoughtful response. Sorry for the delay.
    
    I am not entirely sure how multi-stage jobs would work, but in the current 
PR, after all the executors are started for a stage, the delay timeout resets 
for the next "stage". So, if Spark needs 3 executors, and 3 executors 
eventually start, the next time Spark needs more executors, the delay timeout 
would start fresh. However, if the next stage is requested before the previous 
stage is fully allocated, then the scenario you described happens. I had made 
the assumption that stages would be fully allocated before requesting 
additional executors for the next stage. Do you have any insights into how 
executors in stages are allocated?
    
    I will also look into per-host delay timeouts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to