> and requires operators to enable checkpointing on the slaves.
Just curious why operator needs to enable checkpointing on the slaves (I do
not see an agent flag for that), I think checkpointing should be enabled in
framework level rather than slave.
On Sun, Oct 16, 2016 at 10:18 AM, Zameer Manji <zma...@apache.org> wrote:
> +1 to A and B
> Aurora has enabled checkpointing for years and requires operators to enable
> checkpointing on the slaves.
> On Sat, Oct 15, 2016 at 11:57 AM, Joris Van Remoortere <
> > I'm in favor of A & B. I find it provides a better "first experience" to
> > users.
> > From my experience you usually have to have an explicit reason to not
> > to checkpoint. Most people assume the semantics provided by the
> > behavior is default and it can be a frustrating experience for them to
> > out that is not the case.
> > —
> > *Joris Van Remoortere*
> > Mesosphere
> > On Fri, Oct 14, 2016 at 3:11 PM, Neil Conway <neil.con...@gmail.com>
> > wrote:
> >> Hi folks,
> >> I'd like input from individuals who currently use frameworks but do
> >> not enable checkpointing.
> >> Background: "checkpointing" is a parameter that can be enabled in
> >> FrameworkInfo; if enabled, the agent will write the framework pid,
> >> executor PIDs, and status updates to disk for any tasks started by
> >> that framework. This checkpointed information means that these tasks
> >> can survive an agent crash: if the agent exits (whether due to
> >> crashing or as part of an upgrade procedure), a restarted agent can
> >> use this information to reconnect to executors started by the previous
> >> instance of the agent. The downside is that checkpointing requires
> >> some additional disk I/O at the agent.
> >> Checkpointing is not currently the default, but in my experience it is
> >> often enabled for production frameworks. As part of the work on
> >> supporting partition-aware Mesos frameworks (see MESOS-4049), we are
> >> considering:
> >> (a) requiring that partition-aware frameworks must also enable
> >> checkpointing, and/or
> >> (b) enabling checkpointing by default
> >> If you have intentionally decided to disable checkpointing for your
> >> Mesos framework, I'd be curious to hear more about your use-case and
> >> why you haven't enabled it.
> >> Thanks!
> >> Neil
> >> --
> >> Zameer Manji