Re: Non-checkpointing frameworks

Zhitao Li Mon, 17 Oct 2016 09:45:13 -0700

+1 to both A to B.

Do we plan to eventually drop non-checkpionted framework support (possibly
in v2) and declare that all frameworks has to operate in this assumption?


On Mon, Oct 17, 2016 at 1:36 AM, Aaron Carey <[email protected]> wrote:

> +1 to A and B
>
> Aaron Carey
> Production Engineer - Cloud Pipeline
> Industrial Light & Magic
> London
> 020 3751 9150
>
>
> On 17 October 2016 at 00:38, Qian Zhang <[email protected]> wrote:
>
>> and requires operators to enable checkpointing on the slaves.
>>
>>
>> Just curious why operator needs to enable checkpointing on the slaves (I
>> do not see an agent flag for that), I think checkpointing should be enabled
>> in framework level rather than slave.
>>
>>
>> Thanks,
>> Qian Zhang
>>
>> On Sun, Oct 16, 2016 at 10:18 AM, Zameer Manji <[email protected]> wrote:
>>
>>> +1 to A and B
>>>
>>> Aurora has enabled checkpointing for years and requires operators to
>>> enable
>>> checkpointing on the slaves.
>>>
>>> On Sat, Oct 15, 2016 at 11:57 AM, Joris Van Remoortere <
>>> [email protected]>
>>> wrote:
>>>
>>> > I'm in favor of A & B. I find it provides a better "first experience"
>>> to
>>> > users.
>>> > From my experience you usually have to have an explicit reason to not
>>> want
>>> > to checkpoint. Most people assume the semantics provided by the
>>> checkpoint
>>> > behavior is default and it can be a frustrating experience for them to
>>> find
>>> > out that is not the case.
>>> >
>>> > —
>>> > *Joris Van Remoortere*
>>>
>>> > Mesosphere
>>> >
>>> > On Fri, Oct 14, 2016 at 3:11 PM, Neil Conway <[email protected]>
>>> > wrote:
>>> >
>>> >> Hi folks,
>>> >>
>>> >> I'd like input from individuals who currently use frameworks but do
>>> >> not enable checkpointing.
>>> >>
>>> >> Background: "checkpointing" is a parameter that can be enabled in
>>> >> FrameworkInfo; if enabled, the agent will write the framework pid,
>>> >> executor PIDs, and status updates to disk for any tasks started by
>>> >> that framework. This checkpointed information means that these tasks
>>> >> can survive an agent crash: if the agent exits (whether due to
>>> >> crashing or as part of an upgrade procedure), a restarted agent can
>>> >> use this information to reconnect to executors started by the previous
>>> >> instance of the agent. The downside is that checkpointing requires
>>> >> some additional disk I/O at the agent.
>>> >>
>>> >> Checkpointing is not currently the default, but in my experience it is
>>> >> often enabled for production frameworks. As part of the work on
>>> >> supporting partition-aware Mesos frameworks (see MESOS-4049), we are
>>> >> considering:
>>> >>
>>> >> (a) requiring that partition-aware frameworks must also enable
>>> >> checkpointing, and/or
>>> >> (b) enabling checkpointing by default
>>> >>
>>> >> If you have intentionally decided to disable checkpointing for your
>>> >> Mesos framework, I'd be curious to hear more about your use-case and
>>> >> why you haven't enabled it.
>>> >>
>>> >> Thanks!
>>> >>
>>> >> Neil
>>> >>
>>> >> --
>>> >> Zameer Manji
>>> >>
>>> >
>>>
>>
>>
>


-- 
Cheers,

Zhitao Li

Re: Non-checkpointing frameworks

Reply via email to