+1 to A and B
Aurora has enabled checkpointing for years and requires operators to enable
checkpointing on the slaves.
On Sat, Oct 15, 2016 at 11:57 AM, Joris Van Remoortere <jo...@mesosphere.io>
> I'm in favor of A & B. I find it provides a better "first experience" to
> From my experience you usually have to have an explicit reason to not want
> to checkpoint. Most people assume the semantics provided by the checkpoint
> behavior is default and it can be a frustrating experience for them to find
> out that is not the case.
> *Joris Van Remoortere*
> On Fri, Oct 14, 2016 at 3:11 PM, Neil Conway <neil.con...@gmail.com>
>> Hi folks,
>> I'd like input from individuals who currently use frameworks but do
>> not enable checkpointing.
>> Background: "checkpointing" is a parameter that can be enabled in
>> FrameworkInfo; if enabled, the agent will write the framework pid,
>> executor PIDs, and status updates to disk for any tasks started by
>> that framework. This checkpointed information means that these tasks
>> can survive an agent crash: if the agent exits (whether due to
>> crashing or as part of an upgrade procedure), a restarted agent can
>> use this information to reconnect to executors started by the previous
>> instance of the agent. The downside is that checkpointing requires
>> some additional disk I/O at the agent.
>> Checkpointing is not currently the default, but in my experience it is
>> often enabled for production frameworks. As part of the work on
>> supporting partition-aware Mesos frameworks (see MESOS-4049), we are
>> (a) requiring that partition-aware frameworks must also enable
>> checkpointing, and/or
>> (b) enabling checkpointing by default
>> If you have intentionally decided to disable checkpointing for your
>> Mesos framework, I'd be curious to hear more about your use-case and
>> why you haven't enabled it.
>> Zameer Manji