> > Largely because of a requirement to bring everything back up in a certain > order
I don't think they need to be brought back up in a certain order. You just need to restart all of them. The only requirement is that all masters should be running at 0.19.0. I'd also be very interested in a zookeeper implementation I think there is an issue with ZK impl. Ben Mahler probably can expand here. - Jie On Fri, Jun 13, 2014 at 12:32 AM, Tom Arnfeld <t...@duedil.com> wrote: > Hey Dave (and the group), > > I have to say for me it was a little fiddly to upgrade a 0.18.2 > cluster to 0.19.0. Largely because of a requirement to bring > everything back up in a certain order (I had to lower the quorum count > to 1) otherwise mesos failed to get a majority vote to initialise the > log (I had 3 masters). > > I'd also be very interested in a zookeeper implementation - and > perhaps some improved documentation around the log. > > Cheers, > > Tom. > > > On 13 Jun 2014, at 08:17, Dick Davies <d...@hellooperator.net> wrote: > > > > I thought I read that there was going to be a registry implementation > > backed by zookeeper; > > does anyone know why that was dropped? > > > > Really excited to see the containerizer features rolling in, but the > > quorum looks at first glance > > to make Mesos a little harder to operate > > ("This means adding or removing masters must be done carefully! ") - I > > understand the > > benefits but was hoping we could get by with the zookeeper registry. > > > > > >> On 13 June 2014 03:49, Dave Lester <daveles...@gmail.com> wrote: > >> Hi All, > >> > >> Below is a blog post that Ben Mahler wrote as release manager for Mesos > >> 0.19.0; it was published on the Mesos site today. > >> > >> I know that not everyone follows @ApacheMesos Twitter (even though you > >> should!), so I wanted to make sure was also shared on the user@ list. > >> > >> Cheers, > >> Dave > >> > >> > >> Apache Mesos 0.19.0 Released > >> > >> The latest Mesos release, 0.19.0 is now available for download. This new > >> version includes the following features and improvements: > >> > >> The master now persists the list of registered slaves in a durable > >> replicated manner using the Registrar and the replicated log. > >> Alpha support for custom container technologies has been added with the > >> ExternalContainerizer. > >> Metrics reporting has been overhauled and is now exposed on > >> <ip:port>/metrics/snapshot. > >> Slave Authentication: optionally, only authenticated slaves can register > >> with the master. > >> Numerous bug fixes and stability improvements. > >> > >> Full release notes are available on JIRA. > >> > >> Registrar > >> > >> Mesos 0.19.0 introduces the “Registrar”: the master now persists the > list of > >> registered slaves in a durable replicated manner. The previous lack of > >> durable state was an intentional design decision that simplified > failover > >> and allowed masters to be run and migrated with ease. However, the > stateless > >> design had issues: > >> > >> In the event of a dual failure (slave fails while master is down), no > lost > >> task notifications are sent. This leads to a task running according to > the > >> framework but unknown to Mesos. > >> When a new master is elected, we may allow rogue slaves to re-register > with > >> the master. This leads to tasks running on the slave that are not known > to > >> the framework. > >> > >> Persisting the list of registered slaves allows failed over masters to > >> detect slaves that do not re-register, and notify frameworks > accordingly. It > >> also allows us to prevent rogue slaves from re-registering; terminating > the > >> rogue tasks in the process. > >> > >> The state is persisted using the replicated log (available since 0.9.0). > >> > >> External Containerization > >> > >> As alluded to during the containerization / isolation refactor in > 0.18.0, > >> the ExternalContainerizer has landed in this release. This provides > alpha > >> level support for custom containerization. > >> > >> Developers can implement their own external containerizers to provide > >> support for custom container technologies. Initial Docker support is now > >> available through some community driven external containerizers: Docker > >> Containerizer for Mesos by Tom Arnfeld and Deimos by Jason Dusek. Please > >> reach out on the mailing lists with questions! > >> > >> Metrics > >> > >> Previously, Mesos components had to use custom metrics code and custom > HTTP > >> endpoints for exposing metrics. This made it difficult to expose > additional > >> system metrics and often required having an endpoint for each libprocess > >> Process (Actor) for which metrics were desired. Having metrics spread > across > >> endpoints was operationally complex. > >> > >> We needed a consistent, simple, and global way to expose metrics, which > led > >> to the creation of a metrics library within libprocess. All metrics are > now > >> exposed via /metrics/snapshot. The /stats.json endpoint remains for > >> backwards compatibility. > >> > >> Upgrading > >> > >> For backwards compatibility, the “Registrar” will be enabled in a phased > >> manner. By default, the “Registrar” is write-only in 0.19.0 and will be > >> read/write in 0.20.0. > >> > >> If running in high-availability mode with ZooKeeper, operators must now > >> specify the --work_dir for the master, along with the --quorum size of > the > >> ensemble of masters. This means adding or removing masters must be done > >> carefully! The best practice is to only ever add or remove a single > master > >> at a time and to allow a small amount of time for the replicated log to > >> catch up on the new master. Maintenance documentation will be added to > >> reflect this. > >> > >> Please refer to the upgrades document, which details how to perform an > >> upgrade from 0.18.x. > >> > >> Future Work > >> > >> Thanks to the Registrar, reconciliation primitives can now be provided > to > >> ensure that the state of tasks between Mesos and frameworks is kept > >> consistent. This will remove the need for frameworks to implement > >> out-of-band task reconciliation to inspect the state of slaves. > >> Reconciliation work is being tracked at MESOS-1407. > >> > >> The addition of state through the Registrar opens up a rich set of > possible > >> features that were previously not possible due to the lack of persistent > >> state in the master. These include: > >> > >> Cluster maintenance primitives (MESOS-1474) > >> Repair automation (MESOS-695) > >> Global resource reservations > >> > >> Getting Involved > >> > >> We encourage you to try out this release, and let us know what you > think and > >> if you hit any issues on the user mailing list. You can also get in > touch > >> with us via @ApacheMesos or via mailing lists and IRC. > >> > >> Thanks > >> > >> Thanks to the 32 contributors who made 0.19.0 possible: > >> > >> Ashutosh Jain, Adam B, Alexandra Sava, Anton Lindström, Archana kumari, > >> Benjamin Hindman, Benjamin Mahler, Bernardo Gomez Palacio, Bernd > Mathiske, > >> Charlie Carson, Chengwei Yang, Chi Zhang, Dave Lester, Dominic Hamon, > Ian > >> Downes, Isabel Jimenez, Jake Farrell, Jameel, Al-Aziz, Jiang Yan Xu, > Jie Yu, > >> Nikita Vetoshkin, Niklas Q. Nielsen, Ritwik Yadav, Sam Taha, Steven > Phung, > >> Till Toenshoff, Timothy St. Clair, Tobi Knaup, Tom Arnfeld, Tom > Galloway, > >> Vinod Kone, Vinson Lee >