Re: Apache Mesos 0.19.0 Released

Tom Arnfeld Fri, 13 Jun 2014 12:50:41 -0700

No worries at all. I think once there’s a more solid base for mesos 
documentation in general, it’ll be easier for committers to add new docs for 
new features. Fair enough about the launch ordering – I was probably just a 
little surprised to see a bunch of warnings about an uninitialised log and 
didn’t think about booting them all up (some upgrade notes would have been 
useful here).


Regarding zookeeper, those are some interesting points. Personally, it doesn’t 
bother me that mesos has it’s own mechanism for this (and I think aurora uses 
the mesos log too??). I think the documentation could go a long way in exposing 
that the log exists, and why it’s used for the registry. Am I correct in saying 
frameworks are able to interact with the log to store state too?

Tom.

On 13 Jun 2014, at 20:23, Benjamin Mahler <[email protected]> wrote:

> Tom: Agreed that there needs to be replicated log documentation, I've chatted 
> with Jie and we'll be working to create some. We'll also work to create some 
> maintenance related documentation for the masters as it pertains to the log 
> replicas.
> 
> As Jie mentioned, there is no requirement on bringing masters back up in a 
> certain order. There is a safety mechanism built in to the replicated log 
> that ensures that if the majority of your replica state is lost, writes are 
> prevented. This is why when you first upgrade to the replicated log, all of 
> the masters in your ensemble need to be up with 0.19.0 to have the replicas 
> initialize.
> 
> I apologize for all of the tribal knowledge here, we will get some 
> documentation out there.
> 
> 
> On Fri, Jun 13, 2014 at 12:15 PM, Benjamin Mahler <[email protected]> 
> wrote:
> Dick: Excellent question, the zookeeper backed registry was dropped for a few 
> reasons:
> 
> (1) Znodes by default have a size limit of 1MB. This means if you're cluster 
> grows organically and the set of slaves surpasses 1MB, all subsequent storage 
> operations will fail. You would not be able to add slaves to your cluster 
> past this point. Compression helps, but does not solve it.
> 
> (2) To implement a scalable ZooKeeper backed storage layer, we need to be 
> able to partition our data across znodes and perform atomic writes.
>   (a) Partitioning is non-trivial and we don't know of any C++ libraries that 
> do this already.
>   (b) To my knowledge, before 3.4.x transactional support was missing and 
> applications had to implement two-phase commit [1]. Complex! Even in 3.4.x 
> the transactional support seems to limit total transaction data to 1MB, from 
> the NOTE in [2].
> 
> (3) Alternatively, one can live with a simple, but operationally unfortunate 
> implementation outlined in (1). But that means we would at least need to 
> provide some tooling to make moving between state backends simple. Doable, 
> but implies more work and support.
> 
> (4) ZooKeeper is currently the largest source of disruptions to our system 
> availability, becoming more reliant on it as a permanent storage backend, was 
> a bit worrisome. At Twitter we have had a lot more operational experience and 
> confidence with the replicated log as a permanent storage backend.
> 
> To be clear, there's nothing stopping anyone from wiring up the existing 
> ZooKeeper storage implementation in Mesos and providing it as an alternative 
> to the replicated log. As soon as we provide two we should have tooling to 
> allow people to move between them.
> 
> I hope this clarifies things!
> 
> [1] 
> http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_twoPhasedCommit
> 
> [2] 
> http://zookeeper.apache.org/doc/r3.4.3/api/org/apache/zookeeper/ZooKeeper.html#multi%28java.lang.Iterable%29
> 
> 
> On Fri, Jun 13, 2014 at 11:04 AM, Jie Yu <[email protected]> wrote:
> Largely because of a requirement to bring everything back up in a certain 
> order 
> 
> I don't think they need to be brought back up in a certain order. You just 
> need to restart all of them. The only requirement is that all masters should 
> be running at 0.19.0.
> 
> I'd also be very interested in a zookeeper implementation
> 
> I think there is an issue with ZK impl. Ben Mahler probably can expand here. 
> 
> - Jie
> 
> 
> On Fri, Jun 13, 2014 at 12:32 AM, Tom Arnfeld <[email protected]> wrote:
> Hey Dave (and the group),
> 
> I have to say for me it was a little fiddly to upgrade a 0.18.2
> cluster to 0.19.0. Largely because of a requirement to bring
> everything back up in a certain order (I had to lower the quorum count
> to 1) otherwise mesos failed to get a majority vote to initialise the
> log (I had 3 masters).
> 
> I'd also be very interested in a zookeeper implementation - and
> perhaps some improved documentation around the log.
> 
> Cheers,
> 
> Tom.
> 
> > On 13 Jun 2014, at 08:17, Dick Davies <[email protected]> wrote:
> >
> > I thought I read that there was going to be a registry implementation
> > backed by zookeeper;
> > does anyone know why that was dropped?
> >
> > Really excited to see the containerizer features rolling in, but the
> > quorum looks at first glance
> > to make Mesos a little harder to operate
> > ("This means adding or removing masters must be done carefully! ") - I
> > understand the
> > benefits but was hoping we could get by with the zookeeper registry.
> >
> >
> >> On 13 June 2014 03:49, Dave Lester <[email protected]> wrote:
> >> Hi All,
> >>
> >> Below is a blog post that Ben Mahler wrote as release manager for Mesos
> >> 0.19.0; it was published on the Mesos site today.
> >>
> >> I know that not everyone follows @ApacheMesos Twitter (even though you
> >> should!), so I wanted to make sure was also shared on the user@ list.
> >>
> >> Cheers,
> >> Dave
> >>
> >>
> >> Apache Mesos 0.19.0 Released
> >>
> >> The latest Mesos release, 0.19.0 is now available for download. This new
> >> version includes the following features and improvements:
> >>
> >> The master now persists the list of registered slaves in a durable
> >> replicated manner using the Registrar and the replicated log.
> >> Alpha support for custom container technologies has been added with the
> >> ExternalContainerizer.
> >> Metrics reporting has been overhauled and is now exposed on
> >> <ip:port>/metrics/snapshot.
> >> Slave Authentication: optionally, only authenticated slaves can register
> >> with the master.
> >> Numerous bug fixes and stability improvements.
> >>
> >> Full release notes are available on JIRA.
> >>
> >> Registrar
> >>
> >> Mesos 0.19.0 introduces the “Registrar”: the master now persists the list 
> >> of
> >> registered slaves in a durable replicated manner. The previous lack of
> >> durable state was an intentional design decision that simplified failover
> >> and allowed masters to be run and migrated with ease. However, the 
> >> stateless
> >> design had issues:
> >>
> >> In the event of a dual failure (slave fails while master is down), no lost
> >> task notifications are sent. This leads to a task running according to the
> >> framework but unknown to Mesos.
> >> When a new master is elected, we may allow rogue slaves to re-register with
> >> the master. This leads to tasks running on the slave that are not known to
> >> the framework.
> >>
> >> Persisting the list of registered slaves allows failed over masters to
> >> detect slaves that do not re-register, and notify frameworks accordingly. 
> >> It
> >> also allows us to prevent rogue slaves from re-registering; terminating the
> >> rogue tasks in the process.
> >>
> >> The state is persisted using the replicated log (available since 0.9.0).
> >>
> >> External Containerization
> >>
> >> As alluded to during the containerization / isolation refactor in 0.18.0,
> >> the ExternalContainerizer has landed in this release. This provides alpha
> >> level support for custom containerization.
> >>
> >> Developers can implement their own external containerizers to provide
> >> support for custom container technologies. Initial Docker support is now
> >> available through some community driven external containerizers: Docker
> >> Containerizer for Mesos by Tom Arnfeld and Deimos by Jason Dusek. Please
> >> reach out on the mailing lists with questions!
> >>
> >> Metrics
> >>
> >> Previously, Mesos components had to use custom metrics code and custom HTTP
> >> endpoints for exposing metrics. This made it difficult to expose additional
> >> system metrics and often required having an endpoint for each libprocess
> >> Process (Actor) for which metrics were desired. Having metrics spread 
> >> across
> >> endpoints was operationally complex.
> >>
> >> We needed a consistent, simple, and global way to expose metrics, which led
> >> to the creation of a metrics library within libprocess. All metrics are now
> >> exposed via /metrics/snapshot. The /stats.json endpoint remains for
> >> backwards compatibility.
> >>
> >> Upgrading
> >>
> >> For backwards compatibility, the “Registrar” will be enabled in a phased
> >> manner. By default, the “Registrar” is write-only in 0.19.0 and will be
> >> read/write in 0.20.0.
> >>
> >> If running in high-availability mode with ZooKeeper, operators must now
> >> specify the --work_dir for the master, along with the --quorum size of the
> >> ensemble of masters. This means adding or removing masters must be done
> >> carefully! The best practice is to only ever add or remove a single master
> >> at a time and to allow a small amount of time for the replicated log to
> >> catch up on the new master. Maintenance documentation will be added to
> >> reflect this.
> >>
> >> Please refer to the upgrades document, which details how to perform an
> >> upgrade from 0.18.x.
> >>
> >> Future Work
> >>
> >> Thanks to the Registrar, reconciliation primitives can now be provided to
> >> ensure that the state of tasks between Mesos and frameworks is kept
> >> consistent. This will remove the need for frameworks to implement
> >> out-of-band task reconciliation to inspect the state of slaves.
> >> Reconciliation work is being tracked at MESOS-1407.
> >>
> >> The addition of state through the Registrar opens up a rich set of possible
> >> features that were previously not possible due to the lack of persistent
> >> state in the master. These include:
> >>
> >> Cluster maintenance primitives (MESOS-1474)
> >> Repair automation (MESOS-695)
> >> Global resource reservations
> >>
> >> Getting Involved
> >>
> >> We encourage you to try out this release, and let us know what you think 
> >> and
> >> if you hit any issues on the user mailing list. You can also get in touch
> >> with us via @ApacheMesos or via mailing lists and IRC.
> >>
> >> Thanks
> >>
> >> Thanks to the 32 contributors who made 0.19.0 possible:
> >>
> >> Ashutosh Jain, Adam B, Alexandra Sava, Anton Lindström, Archana kumari,
> >> Benjamin Hindman, Benjamin Mahler, Bernardo Gomez Palacio, Bernd Mathiske,
> >> Charlie Carson, Chengwei Yang, Chi Zhang, Dave Lester, Dominic Hamon, Ian
> >> Downes, Isabel Jimenez, Jake Farrell, Jameel, Al-Aziz, Jiang Yan Xu, Jie 
> >> Yu,
> >> Nikita Vetoshkin, Niklas Q. Nielsen, Ritwik Yadav, Sam Taha, Steven Phung,
> >> Till Toenshoff, Timothy St. Clair, Tobi Knaup, Tom Arnfeld, Tom Galloway,
> >> Vinod Kone, Vinson Lee
> 
> 
>

Re: Apache Mesos 0.19.0 Released

Reply via email to