Re: Verifying Zero Downtime Upgrade Process For Existing Mesos Cluster

Alex Rukletsov Mon, 07 Dec 2015 11:40:16 -0800

Hi Abishek,

I would strongly advise not to skip 6 versions. It's hard to say whether
there were any changes that will prevent 0.25 masters to talk to 0.19
slaves (my intuition says there were some breaking changes to protobufs).
We do *not* support upgrade by skipping version, so please upgrade to 0.20,
wait for stabilization, and repeat the procedure 5 more times.


In the future we may move to another deprecation cycle, but currently we
have a 2-version one.

Mind reporting your experience to the list once you're done? Thanks!

On Thu, Dec 3, 2015 at 10:28 PM, Abishek Ravi <abishekr...@gmail.com> wrote:

> Would the following process enable zero downtime upgrade of Mesos (0.19 to
> 0.25) in an existing Mesos cluster?
>
> 0. From [1] it doesn't seem like there are any incompatible changes
> introduced between 0.19 and 0.25.
> 1. Deploy Mesos(0.25) binaries to unelected master nodes
> 2. Deploy Mesos(0.25) binaries to leading master. This should potentially
> result in master re-election and elect a master which already has
> Mesos(0.25) installed from Step (1).
> 3. Deploy Mesos(0.25) binaries to mesos slave nodes. Existing tasks should
> continue to execute and report to the master after mesos process launch
> (with 0.25 binaries) on the slave node.
>
> Known gotchas:
> 1. Any monitoring built around state.json and stats.json should be updated
> accordingly as endpoints have changed [1].
> 2. Checkpointing should be enabled (It is not automatically enabled in
> 0.19) [2] .
> 3. recovery_timeout for slave nodes should be set to an appropriate value
> depending on how long it takes to install Mesos(0.25) on the slave nodes.
>
> Is any step missing in the upgrade process? Are there other gotchas that
> one needs to be aware of?
>
> [1] http://mesos.apache.org/documentation/latest/upgrades/
> [2] http://mesos.apache.org/documentation/latest/slave-recovery/
>
> Thanks,
> Abishek
>

Re: Verifying Zero Downtime Upgrade Process For Existing Mesos Cluster

Reply via email to