> > and requires operators to enable checkpointing on the slaves.
Just curious why operator needs to enable checkpointing on the slaves (I do not see an agent flag for that), I think checkpointing should be enabled in framework level rather than slave. Thanks, Qian Zhang On Sun, Oct 16, 2016 at 10:18 AM, Zameer Manji <zma...@apache.org> wrote: > +1 to A and B > > Aurora has enabled checkpointing for years and requires operators to enable > checkpointing on the slaves. > > On Sat, Oct 15, 2016 at 11:57 AM, Joris Van Remoortere < > jo...@mesosphere.io> > wrote: > > > I'm in favor of A & B. I find it provides a better "first experience" to > > users. > > From my experience you usually have to have an explicit reason to not > want > > to checkpoint. Most people assume the semantics provided by the > checkpoint > > behavior is default and it can be a frustrating experience for them to > find > > out that is not the case. > > > > — > > *Joris Van Remoortere* > > Mesosphere > > > > On Fri, Oct 14, 2016 at 3:11 PM, Neil Conway <neil.con...@gmail.com> > > wrote: > > > >> Hi folks, > >> > >> I'd like input from individuals who currently use frameworks but do > >> not enable checkpointing. > >> > >> Background: "checkpointing" is a parameter that can be enabled in > >> FrameworkInfo; if enabled, the agent will write the framework pid, > >> executor PIDs, and status updates to disk for any tasks started by > >> that framework. This checkpointed information means that these tasks > >> can survive an agent crash: if the agent exits (whether due to > >> crashing or as part of an upgrade procedure), a restarted agent can > >> use this information to reconnect to executors started by the previous > >> instance of the agent. The downside is that checkpointing requires > >> some additional disk I/O at the agent. > >> > >> Checkpointing is not currently the default, but in my experience it is > >> often enabled for production frameworks. As part of the work on > >> supporting partition-aware Mesos frameworks (see MESOS-4049), we are > >> considering: > >> > >> (a) requiring that partition-aware frameworks must also enable > >> checkpointing, and/or > >> (b) enabling checkpointing by default > >> > >> If you have intentionally decided to disable checkpointing for your > >> Mesos framework, I'd be curious to hear more about your use-case and > >> why you haven't enabled it. > >> > >> Thanks! > >> > >> Neil > >> > >> -- > >> Zameer Manji > >> > > >