Dick: Excellent question, the zookeeper backed registry was dropped for a
few reasons:

(1) Znodes by default have a size limit of 1MB. This means if you're
cluster grows organically and the set of slaves surpasses 1MB, all
subsequent storage operations will fail. You would not be able to add
slaves to your cluster past this point. Compression helps, but does not
solve it.

(2) To implement a scalable ZooKeeper backed storage layer, we need to be
able to partition our data across znodes and perform atomic writes.
  (a) Partitioning is non-trivial and we don't know of any C++ libraries
that do this already.
  (b) To my knowledge, before 3.4.x transactional support was missing and
applications had to implement two-phase commit [1]. Complex! Even in 3.4.x
the transactional support seems to limit total transaction data to 1MB,
from the NOTE in [2].

(3) Alternatively, one can live with a simple, but operationally
unfortunate implementation outlined in (1). But that means we would at
least need to provide some tooling to make moving between state backends
simple. Doable, but implies more work and support.

(4) ZooKeeper is currently the largest source of disruptions to our system
availability, becoming more reliant on it as a permanent storage backend,
was a bit worrisome. At Twitter we have had a lot more operational
experience and confidence with the replicated log as a *permanent* storage
backend.

To be clear, there's nothing stopping anyone from wiring up the existing
ZooKeeper storage implementation in Mesos and providing it as an
alternative to the replicated log. As soon as we provide two we should have
tooling to allow people to move between them.

I hope this clarifies things!

[1]
http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_twoPhasedCommit

[2]
http://zookeeper.apache.org/doc/r3.4.3/api/org/apache/zookeeper/ZooKeeper.html#multi%28java.lang.Iterable%29

On Fri, Jun 13, 2014 at 11:04 AM, Jie Yu <yujie....@gmail.com> wrote:

> Largely because of a requirement to bring everything back up in a certain
>> order
>
>
> I don't think they need to be brought back up in a certain order. You just
> need to restart all of them. The only requirement is that all masters
> should be running at 0.19.0.
>
> I'd also be very interested in a zookeeper implementation
>
>
> I think there is an issue with ZK impl. Ben Mahler probably can expand
> here.
>
> - Jie
>
>
> On Fri, Jun 13, 2014 at 12:32 AM, Tom Arnfeld <t...@duedil.com> wrote:
>
>> Hey Dave (and the group),
>>
>> I have to say for me it was a little fiddly to upgrade a 0.18.2
>> cluster to 0.19.0. Largely because of a requirement to bring
>> everything back up in a certain order (I had to lower the quorum count
>> to 1) otherwise mesos failed to get a majority vote to initialise the
>> log (I had 3 masters).
>>
>> I'd also be very interested in a zookeeper implementation - and
>> perhaps some improved documentation around the log.
>>
>> Cheers,
>>
>> Tom.
>>
>> > On 13 Jun 2014, at 08:17, Dick Davies <d...@hellooperator.net> wrote:
>> >
>> > I thought I read that there was going to be a registry implementation
>> > backed by zookeeper;
>> > does anyone know why that was dropped?
>> >
>> > Really excited to see the containerizer features rolling in, but the
>> > quorum looks at first glance
>> > to make Mesos a little harder to operate
>> > ("This means adding or removing masters must be done carefully! ") - I
>> > understand the
>> > benefits but was hoping we could get by with the zookeeper registry.
>> >
>> >
>> >> On 13 June 2014 03:49, Dave Lester <daveles...@gmail.com> wrote:
>> >> Hi All,
>> >>
>> >> Below is a blog post that Ben Mahler wrote as release manager for Mesos
>> >> 0.19.0; it was published on the Mesos site today.
>> >>
>> >> I know that not everyone follows @ApacheMesos Twitter (even though you
>> >> should!), so I wanted to make sure was also shared on the user@ list.
>> >>
>> >> Cheers,
>> >> Dave
>> >>
>> >>
>> >> Apache Mesos 0.19.0 Released
>> >>
>> >> The latest Mesos release, 0.19.0 is now available for download. This
>> new
>> >> version includes the following features and improvements:
>> >>
>> >> The master now persists the list of registered slaves in a durable
>> >> replicated manner using the Registrar and the replicated log.
>> >> Alpha support for custom container technologies has been added with the
>> >> ExternalContainerizer.
>> >> Metrics reporting has been overhauled and is now exposed on
>> >> <ip:port>/metrics/snapshot.
>> >> Slave Authentication: optionally, only authenticated slaves can
>> register
>> >> with the master.
>> >> Numerous bug fixes and stability improvements.
>> >>
>> >> Full release notes are available on JIRA.
>> >>
>> >> Registrar
>> >>
>> >> Mesos 0.19.0 introduces the “Registrar”: the master now persists the
>> list of
>> >> registered slaves in a durable replicated manner. The previous lack of
>> >> durable state was an intentional design decision that simplified
>> failover
>> >> and allowed masters to be run and migrated with ease. However, the
>> stateless
>> >> design had issues:
>> >>
>> >> In the event of a dual failure (slave fails while master is down), no
>> lost
>> >> task notifications are sent. This leads to a task running according to
>> the
>> >> framework but unknown to Mesos.
>> >> When a new master is elected, we may allow rogue slaves to re-register
>> with
>> >> the master. This leads to tasks running on the slave that are not
>> known to
>> >> the framework.
>> >>
>> >> Persisting the list of registered slaves allows failed over masters to
>> >> detect slaves that do not re-register, and notify frameworks
>> accordingly. It
>> >> also allows us to prevent rogue slaves from re-registering;
>> terminating the
>> >> rogue tasks in the process.
>> >>
>> >> The state is persisted using the replicated log (available since
>> 0.9.0).
>> >>
>> >> External Containerization
>> >>
>> >> As alluded to during the containerization / isolation refactor in
>> 0.18.0,
>> >> the ExternalContainerizer has landed in this release. This provides
>> alpha
>> >> level support for custom containerization.
>> >>
>> >> Developers can implement their own external containerizers to provide
>> >> support for custom container technologies. Initial Docker support is
>> now
>> >> available through some community driven external containerizers: Docker
>> >> Containerizer for Mesos by Tom Arnfeld and Deimos by Jason Dusek.
>> Please
>> >> reach out on the mailing lists with questions!
>> >>
>> >> Metrics
>> >>
>> >> Previously, Mesos components had to use custom metrics code and custom
>> HTTP
>> >> endpoints for exposing metrics. This made it difficult to expose
>> additional
>> >> system metrics and often required having an endpoint for each
>> libprocess
>> >> Process (Actor) for which metrics were desired. Having metrics spread
>> across
>> >> endpoints was operationally complex.
>> >>
>> >> We needed a consistent, simple, and global way to expose metrics,
>> which led
>> >> to the creation of a metrics library within libprocess. All metrics
>> are now
>> >> exposed via /metrics/snapshot. The /stats.json endpoint remains for
>> >> backwards compatibility.
>> >>
>> >> Upgrading
>> >>
>> >> For backwards compatibility, the “Registrar” will be enabled in a
>> phased
>> >> manner. By default, the “Registrar” is write-only in 0.19.0 and will be
>> >> read/write in 0.20.0.
>> >>
>> >> If running in high-availability mode with ZooKeeper, operators must now
>> >> specify the --work_dir for the master, along with the --quorum size of
>> the
>> >> ensemble of masters. This means adding or removing masters must be done
>> >> carefully! The best practice is to only ever add or remove a single
>> master
>> >> at a time and to allow a small amount of time for the replicated log to
>> >> catch up on the new master. Maintenance documentation will be added to
>> >> reflect this.
>> >>
>> >> Please refer to the upgrades document, which details how to perform an
>> >> upgrade from 0.18.x.
>> >>
>> >> Future Work
>> >>
>> >> Thanks to the Registrar, reconciliation primitives can now be provided
>> to
>> >> ensure that the state of tasks between Mesos and frameworks is kept
>> >> consistent. This will remove the need for frameworks to implement
>> >> out-of-band task reconciliation to inspect the state of slaves.
>> >> Reconciliation work is being tracked at MESOS-1407.
>> >>
>> >> The addition of state through the Registrar opens up a rich set of
>> possible
>> >> features that were previously not possible due to the lack of
>> persistent
>> >> state in the master. These include:
>> >>
>> >> Cluster maintenance primitives (MESOS-1474)
>> >> Repair automation (MESOS-695)
>> >> Global resource reservations
>> >>
>> >> Getting Involved
>> >>
>> >> We encourage you to try out this release, and let us know what you
>> think and
>> >> if you hit any issues on the user mailing list. You can also get in
>> touch
>> >> with us via @ApacheMesos or via mailing lists and IRC.
>> >>
>> >> Thanks
>> >>
>> >> Thanks to the 32 contributors who made 0.19.0 possible:
>> >>
>> >> Ashutosh Jain, Adam B, Alexandra Sava, Anton Lindström, Archana kumari,
>> >> Benjamin Hindman, Benjamin Mahler, Bernardo Gomez Palacio, Bernd
>> Mathiske,
>> >> Charlie Carson, Chengwei Yang, Chi Zhang, Dave Lester, Dominic Hamon,
>> Ian
>> >> Downes, Isabel Jimenez, Jake Farrell, Jameel, Al-Aziz, Jiang Yan Xu,
>> Jie Yu,
>> >> Nikita Vetoshkin, Niklas Q. Nielsen, Ritwik Yadav, Sam Taha, Steven
>> Phung,
>> >> Till Toenshoff, Timothy St. Clair, Tobi Knaup, Tom Arnfeld, Tom
>> Galloway,
>> >> Vinod Kone, Vinson Lee
>>
>
>

Reply via email to