I would like to point out that using a new FrameworkID is not a solution to
this problem. This means that a cluster operator has to drain the entire
cluster to enable checkpointing, or lose all previous tasks. Both scenarios
are not desirable.

Fortunately it is possible to do this without changing the FrameworkID. I
have cced Steve from TellApart who has enabled checkpointing without
changing the FrameworkID on a production cluster. I hope he can share his
process here.

On Tue, Feb 24, 2015 at 3:51 PM, Tim Chen <t...@mesosphere.io> wrote:

> Mesos checkpoints the FrameworkInfo into disk, and recovers it on relaunch.
>
> I don't think we expose any API to remove the framework manually though if
> you really want to keep the FrameworkID. If you hit the failover timeout
> the framework will get removed from the master and slave.
>
> I think for now the best way is just use a new FrameworkID when you want
> to change the FrameworkInfo.
>
> Tim
>
>
>
> On Tue, Feb 24, 2015 at 3:32 PM, Thomas Petr <tp...@hubspot.com> wrote:
>
>> Hey folks,
>>
>> Is there a best practice for rolling out FrameworkInfo changes? We need
>> to set checkpoint to true, so I redeployed our framework with the new
>> settings (with tasks still running), but when I hit a slave's stats.json
>> endpoint, it appears that the old FrameworkInfo data is still there (which
>> makes sense since there's active executors running). I then tried draining
>> the tasks and completely restarting a Mesos slave, but still no luck.
>>
>> Is there anything additional / special I need to do here? Is some part of
>> Mesos caching FrameworkInfo based on the framework ID?
>>
>> Another wrinkle with our setup is we have a rather large failover_timeout
>> set for the framework -- maybe that's affecting things too?
>>
>> Thanks,
>> Tom
>>
>
>


-- 
Zameer Manji

Reply via email to