Todd has put it much more eloquently. Comments below:
On 7/20/09 11:50 AM, "Todd Greenwood" <to...@audiencescience.com>
Flavio, Ted, Henry, Scott, this would perfectly well for my use case
GROUP A : ZK Servers w/ read/write AND Leader Elections
GROUP B : ZK Servers w/ read/write W/O Leader Elections
So, we can craft this via Observers and Hiererarchial Quorum groups?
Great. Problem solved.
When will this be production ready? :o)
Scott brought up a multi-feature that is very interesting for me.
1. Offline ZK servers that sync & merge on reconnect
The offline servers seems conceptually simple, it's kind of like a
messaging system. However, the merge and resolve step when two
reconnect might be challenging. Cool idea though.
Yes, this is very useful for WAN use cases. I've already done
like it with a hack:
Ensemble A "Master/Central"
"Remote Proxy" N -- embeds its own ZK, and runs two clients. One
connects to Ensemble A and watches a partial sub-graph, propagating
into its local embedded ZK server. This subgraph is read-only for
that access the Proxy. A second client accesses the local ZK
monitors a different subgraph, which it propagates to the Master
This is writeable by clients accessing the Proxy and on the Master
is only written to by this Proxy.
The above is all application enforced. There are constraints on
things can be built with this, but for the subset of use cases I
WAN, its more than enough.
2. Partial memory graph subscriptions
The second idea is partial memory graph subscriptions. This would
virtual ensembles to interract on the same physical ensemble. For
case, this would prevent unnecessary cross talk between nodes on a
allowing me to define the subsets of the memory graph that need to
replicated, and to whom. This would be a huge scalability win for
Yes, a more general partial graph subscription / ownership
allow for not just better WAN scalability but also (and more
higher reliability. Often, some large subset of application
is local to one network, and a minority is global and in need of WAN
communication. In this case, when the WAN breaks one wishes that
functionality to continue to function, and only those parts truly
on external events to be interrupted.
Currently one has to have separate ensembles to partition data and
'bridge' code to intercommunicate.
It would certainly be more natural if two ZK ensembles could
each other, in a 'partial sub-graph publish/subscribe' framework.
almost be like file system mounting:
subscribe otherEnsemble:port/path/to/otherstuff /localpath/to/
Publishing is the same thing -- think of it as a request for a
cluster to subscribe to the local ZK's data.
From: Scott Carey [mailto:sc...@richrelevance.com]
Sent: Monday, July 20, 2009 11:00 AM
Subject: Re: Leader Elections
Observers would be awesome especially with a couple enhancements /
An option for the observers to enter a special state if the WAN link
goes down to the "master" cluster. A read-only option would be
However, allowing certain types of writes to continue on a limited
would be highly valuable as well. An observer could "own" a special
node and its subnodes. Only these subnodes would be writable by the
observer when there was a session break to the master cluster, and
master cluster would take all the changes when the link is
reestablished. Essentially, it is a portion of the hierarchy that
writable only by a specitfic observer, and read-only for others.
The purpose of this would be for when the WAN link goes down to the
"master" ZKs for certain types of use cases - status updates or
changes local to the observer that are strictly read-only outside
On 7/19/09 12:16 PM, "Henry Robinson" <he...@cloudera.com> wrote:
You can. See ZOOKEEPER-368 - at first glance it sounds like
be a good fit for your requirements.
Do bear in mind that the patch on the jira is only for discussion
I would not consider it currently fit for production use. I hope
much better patch this week.
On Sat, Jul 18, 2009 at 7:38 PM, Ted Dunning <ted.dunn...@gmail.com>
Can you submit updates via an observer?
On Sat, Jul 18, 2009 at 6:38 AM, Flavio Junqueira <f...@yahoo-inc.com
2- Observers: you could have one computing center containing an
and observers around the edge just learning committed values.
Ted Dunning, CTO