Github user lins05 commented on the issue:

    https://github.com/apache/spark/pull/17750
  
    > Yes, It is true that there is an associated overhead in both modes, 
that's why the defaults have not been changed. i.e. Default behavior is not to 
checkpoint.
    
    The overhead in fine-grained mode would be much heavier than coarse grained 
mode. For example, each time you run `rdd.collect()` on an 100MB RDD, the mesos 
agent where the executor runs would write 100MB to disk and delete it after the 
driver acknowledge the message. 
    
    In contrast, in the coarse grained mode the executor would send the 100MB 
data to the driver directly without going through mesos agents. The only thing 
that agents write to disk are small task status messages like 
TASK_RUNNING/TASK_KILLED which are typically several KBbytes.
    
    
    >Setting failover_timeout is necessary as there has to be a max limit for 
how long an agent can be considered to be given back to a failing task.
    
    > And considering that this is being used in the latest version I guess the 
Spark Driver does support it.
    
https://github.com/apache/spark/blob/v2.1.1/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L315
    
    The code in your link is the mesos cluster scheduler, which is **a mesos 
framework that launches spark drivers for you**, not **the mesos scheduler 
inside the spark driver that launches executors.** 
    
    It has `checkpoint` and `failover_timout` set so that the spark drivers 
managed by it won't be killed even if itself is restarted/killed. If you look 
at the code of `MesosClusterScheduler.regsitered` method you can see it calls 
`driver.reconcileTasks()`, which is how it achieves that. In contrast you can't 
the call to reconcileTasks in e.g. `MesosCoarseGrainedSchedulerBackend`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to