Zookeeper WAN Configuration
Like most folks, our WAN is composed of various zones, some central processing, some edge, some corp, and some in between (DMZs). In this model, a given Zookeeper server will not have direct connectivity to all of it's peers in the ensemble due to various security constraints. Is this a problem? Are there special configurations for this model? Given 3 Zones - A -- B B -- C A cannot see C, and vice versa. B can see A and C. 1. Will zookeeper servers function properly even if a given set of servers can only see some of the servers in the ensemble? For example, the shared config lists all zk servers in A, B, and C, but A can only see B, C can only see B, and B can see both A and C. 2. Will zookeeper servers flood the log with error messages if only a subset of the ensemble members are visible? 3. Will the zk ensemble function properly if the config used by each server only lists the servers in the ensemble that are visible? Suppose that A has a config that only list servers in A and B, C a config for C and B, and B has a config that lists servers in A, B, and C. Is this the recommended approach? http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperAdmin.html
Re: Zookeeper WAN Configuration
Each member needs a connection to a quorum. The quorum is ceiling((N+1) / 2) members of the cluster. This guarantees that network partition does not allow two leaders to go on stamping out revisions independent of each other. On Fri, Jul 24, 2009 at 4:23 PM, Todd Greenwood to...@audiencescience.comwrote: Ted, could you elaborate a bit more on this? I was under the (mis) impression that each ZK server in an ensemble only needed connectivity to another member in the ensemble, not to each member in the ensemble. It sounds like you are saying the latter is true. -- Ted Dunning, CTO DeepDyve
Re: Zookeeper WAN Configuration
Servers in a quorum need to be able to talk to each other to elect a leader. Once a leader is elected, followers only talk to the leader. Of course, if the leader fails, servers in some quorum will need to talk to each other again. If no quorum can be formed, the system is stalled. -Flavio On Jul 24, 2009, at 4:37 PM, Ted Dunning wrote: Each member needs a connection to a quorum. The quorum is ceiling((N +1) / 2) members of the cluster. This guarantees that network partition does not allow two leaders to go on stamping out revisions independent of each other. On Fri, Jul 24, 2009 at 4:23 PM, Todd Greenwood to...@audiencescience.comwrote: Ted, could you elaborate a bit more on this? I was under the (mis) impression that each ZK server in an ensemble only needed connectivity to another member in the ensemble, not to each member in the ensemble. It sounds like you are saying the latter is true. -- Ted Dunning, CTO DeepDyve
RE: Zookeeper WAN Configuration
Flavio Ted, thank you for your comments. So it sounds like the only way to currently deploy to the WAN is to deploy ZK Servers to the central DC and open up client connections to these ZK servers from the edge nodes. True? In the future, once the Observers feature is implemented, then we should be able to deploy zk servers to both the DC and to the pods...with all the goodness that Flavio mentions below. Flavio - do you have a doc that describes exactly what happens in the transaction of a write operation? For instance, I'd like to know at exactly what stage a write has been commited to the ensemble, and not just the zk server the client is connected to. I figure it must be something like: clientA.write(path, value) - serverA writes to memory - serverA writes to transacted disk every n/seconds or m/bytes - serverA sends write to Leader - Leader stamps with transaction id - Leader responds to ensemble with update + transaction id -Todd -Original Message- From: Flavio Junqueira [mailto:f...@yahoo-inc.com] Sent: Friday, July 24, 2009 4:50 PM To: zookeeper-user@hadoop.apache.org Subject: Re: Zookeeper WAN Configuration Just a few quick observations: On Jul 24, 2009, at 4:40 PM, Ted Dunning wrote: On Fri, Jul 24, 2009 at 4:23 PM, Todd Greenwood to...@audiencescience.comwrote: Could you explain the idea behind the Observers feature, what this concept is supposed to address, and how it applies to the WAN configuration problem in particular? Not really. I am just echoing comments on observers from them that know. Without observers, increasing the number of servers in an ensemble enables higher read throughput, but causes write throughput to drop because the number of votes to order each write operation increases. Essentially, observers are zookeeper servers that don't vote when ordering updates to the zookeeper state. Adding observers enables higher read throughput affecting minimally write throughput (leader still has to send commits to everyone, at least in the version we have been working on). The ideas for federating ZK or allowing observers would likely do what you want. I can imagine that an observer would only care that it can see it's local peers and one of the observers would be elected to get updates (and thus would care about the central service). This certainly sounds like exactly what I want...Was this introduced in 3.2 in full, or only partially? I don't think it is even in trunk yet. Look on Jira or at the recent logs of this mailing list. It is not on trunk yet. -Flavio