Github user lins05 commented on the issue: https://github.com/apache/spark/pull/17750 > Yes, It is true that there is an associated overhead in both modes, that's why the defaults have not been changed. i.e. Default behavior is not to checkpoint. The overhead in fine-grained mode would be much heavier than coarse grained mode. For example, each time you run `rdd.collect()` on an 100MB RDD, the mesos agent where the executor runs would write 100MB to disk and delete it after the driver acknowledge the message. In contrast, in the coarse grained mode the executor would send the 100MB data to the driver directly without going through mesos agents. The only thing that agents write to disk are small task status messages like TASK_RUNNING/TASK_KILLED which are typically several KBbytes. >Setting failover_timeout is necessary as there has to be a max limit for how long an agent can be considered to be given back to a failing task. > And considering that this is being used in the latest version I guess the Spark Driver does support it. https://github.com/apache/spark/blob/v2.1.1/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L315 The code in your link is the mesos cluster scheduler, which is **a mesos framework that launches spark drivers for you**, not **the mesos scheduler inside the spark driver that launches executors.** It has `checkpoint` and `failover_timout` set so that the spark drivers managed by it won't be killed even if itself is restarted/killed. If you look at the code of `MesosClusterScheduler.regsitered` method you can see it calls `driver.reconcileTasks()`, which is how it achieves that. In contrast you can't the call to reconcileTasks in e.g. `MesosCoarseGrainedSchedulerBackend`.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org