Distributed systems are hard - but most importantly, they all differ in various ways.
> I feel the zookeeper is almost unstable for a cluster. this is too a general and vague statement to be either true or false (or provide any guidance): it all depends on how you deploy your ensemble, what hardware it runs on, what virtualization layer you use, how do you manage failovers and recovery. But, way more importantly, it all depends on *your* requirements: a configuration that works perfectly fine for a few hundred nodes, distributed across 2-3 DCs in a geographically "contained" region (eg, North America) would be woefully inadequate for a system running across 6 global DCs, covering several thousand of nodes, with tight latency requirements. Outside of Google (where we would use our "own stuff" - Borg, Chubby & friends) I've never really had any trouble with ZK - then again, maybe the stuff I worked on, was nowhere near as complex as what you're trying to achieve. My suggestion would be to try it out on a staging environment, conduct some performance and stress test, and find out whether the performance, stability and availability of the ZK ensemble (and, consequently, of the Mesos cluster) meet your requirements. Hope this helps. *Marco Massenzio* *Distributed Systems Engineer* On Sun, Aug 2, 2015 at 10:15 AM, tommy xiao <[email protected]> wrote: > today i reading ZooKeeper Resilience at Pinterest ( > https://engineering.pinterest.com/blog/zookeeper-resilience-pinterest?route=/post/%3Aid/%3Asummary), > I feel the zookeeper is almost unstable for a cluster. > > Does anyone have some experience with the zookeeper usage? > > -- > Deshi Xiao > Twitter: xds2000 > E-mail: xiaods(AT)gmail.com >

