Hi Abishek, I would strongly advise not to skip 6 versions. It's hard to say whether there were any changes that will prevent 0.25 masters to talk to 0.19 slaves (my intuition says there were some breaking changes to protobufs). We do *not* support upgrade by skipping version, so please upgrade to 0.20, wait for stabilization, and repeat the procedure 5 more times.
In the future we may move to another deprecation cycle, but currently we have a 2-version one. Mind reporting your experience to the list once you're done? Thanks! On Thu, Dec 3, 2015 at 10:28 PM, Abishek Ravi <abishekr...@gmail.com> wrote: > Would the following process enable zero downtime upgrade of Mesos (0.19 to > 0.25) in an existing Mesos cluster? > > 0. From [1] it doesn't seem like there are any incompatible changes > introduced between 0.19 and 0.25. > 1. Deploy Mesos(0.25) binaries to unelected master nodes > 2. Deploy Mesos(0.25) binaries to leading master. This should potentially > result in master re-election and elect a master which already has > Mesos(0.25) installed from Step (1). > 3. Deploy Mesos(0.25) binaries to mesos slave nodes. Existing tasks should > continue to execute and report to the master after mesos process launch > (with 0.25 binaries) on the slave node. > > Known gotchas: > 1. Any monitoring built around state.json and stats.json should be updated > accordingly as endpoints have changed [1]. > 2. Checkpointing should be enabled (It is not automatically enabled in > 0.19) [2] . > 3. recovery_timeout for slave nodes should be set to an appropriate value > depending on how long it takes to install Mesos(0.25) on the slave nodes. > > Is any step missing in the upgrade process? Are there other gotchas that > one needs to be aware of? > > [1] http://mesos.apache.org/documentation/latest/upgrades/ > [2] http://mesos.apache.org/documentation/latest/slave-recovery/ > > Thanks, > Abishek >