+1 to A and B Aaron Carey Production Engineer - Cloud Pipeline Industrial Light & Magic London 020 3751 9150
On 17 October 2016 at 00:38, Qian Zhang <zhq527...@gmail.com> wrote: > and requires operators to enable checkpointing on the slaves. > > > Just curious why operator needs to enable checkpointing on the slaves (I > do not see an agent flag for that), I think checkpointing should be enabled > in framework level rather than slave. > > > Thanks, > Qian Zhang > > On Sun, Oct 16, 2016 at 10:18 AM, Zameer Manji <zma...@apache.org> wrote: > >> +1 to A and B >> >> Aurora has enabled checkpointing for years and requires operators to >> enable >> checkpointing on the slaves. >> >> On Sat, Oct 15, 2016 at 11:57 AM, Joris Van Remoortere < >> jo...@mesosphere.io> >> wrote: >> >> > I'm in favor of A & B. I find it provides a better "first experience" to >> > users. >> > From my experience you usually have to have an explicit reason to not >> want >> > to checkpoint. Most people assume the semantics provided by the >> checkpoint >> > behavior is default and it can be a frustrating experience for them to >> find >> > out that is not the case. >> > >> > — >> > *Joris Van Remoortere* >> >> > Mesosphere >> > >> > On Fri, Oct 14, 2016 at 3:11 PM, Neil Conway <neil.con...@gmail.com> >> > wrote: >> > >> >> Hi folks, >> >> >> >> I'd like input from individuals who currently use frameworks but do >> >> not enable checkpointing. >> >> >> >> Background: "checkpointing" is a parameter that can be enabled in >> >> FrameworkInfo; if enabled, the agent will write the framework pid, >> >> executor PIDs, and status updates to disk for any tasks started by >> >> that framework. This checkpointed information means that these tasks >> >> can survive an agent crash: if the agent exits (whether due to >> >> crashing or as part of an upgrade procedure), a restarted agent can >> >> use this information to reconnect to executors started by the previous >> >> instance of the agent. The downside is that checkpointing requires >> >> some additional disk I/O at the agent. >> >> >> >> Checkpointing is not currently the default, but in my experience it is >> >> often enabled for production frameworks. As part of the work on >> >> supporting partition-aware Mesos frameworks (see MESOS-4049), we are >> >> considering: >> >> >> >> (a) requiring that partition-aware frameworks must also enable >> >> checkpointing, and/or >> >> (b) enabling checkpointing by default >> >> >> >> If you have intentionally decided to disable checkpointing for your >> >> Mesos framework, I'd be curious to hear more about your use-case and >> >> why you haven't enabled it. >> >> >> >> Thanks! >> >> >> >> Neil >> >> >> >> -- >> >> Zameer Manji >> >> >> > >> > >