Flavio, thank you for the suggestion.

I have looked at the documention (relevant snippets pasted in below), and 
looked at the presentations 
(http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations),
but I still have some questions about WAN configuration:

---------------------------------------------------------------
WAN
----
A <-> B
A <-> C
A <-> D

A is a central processing hub (DC).
B-D are remote colo edge nodes (PODS).
Each POD contains (m) ZK Servers with (q) client connections.
---------------------------------------------------------------
  
What are the advantages and disadvantages to co-locating ZK Servers across a 
WAN? Could you correct my admitedly naïve assumtions here?

1. ZK Servers within a POD would significantly improve read/write performance 
within a given POD, v.s. clients within the POD opening connections to the DC.

2. ZK Servers within a POD would provide local file transacted storage of 
writes, obviating the need to write that code ourselves.

3. ZK Servers within the POD would be resilient to network connectivity failure 
between the POD and the DC. Once connectivity re-established, the ZK Servers in 
the POD would sync with the ZK servers in the DC, and, from the perspective of 
a client within the POD, everything just worked, and there was no network 
failure.

4. A WAN topology of co-located ZK servers in both the DC and (n) PODs would 
not significantly degrade the performance of the ensemble, provided large blobs 
of traffic were not being sent across the network.

--------------------
Doc references below
--------------------

http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html

"""
group.x=nnnnn[:nnnnn]

    (No Java system property)

    Enables a hierarchical quorum construction."x" is a group identifier and 
the numbers following the "=" sign correspond to server identifiers. The 
left-hand side of the assignment is a colon-separated list of server 
identifiers. Note that groups must be disjoint and the union of all groups must 
be the ZooKeeper ensemble.
weight.x=nnnnn

    (No Java system property)

    Used along with "group", it assigns a weight to a server when forming 
quorums. Such a value corresponds to the weight of a server when voting. There 
are a few parts of ZooKeeper that require voting such as leader election and 
the atomic broadcast protocol. By default the weight of server is 1. If the 
configuration defines groups, but not weights, then a value of 1 will be 
assigned to all servers.
"""

http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperInternals.html

"""
A different construction that uses weights and is useful in wide-area 
deployments (co-locations) is a hierarchical one. With this construction, we 
split the servers into disjoint groups and assign weights to processes. To form 
a quorum, we have to get a hold of enough servers from a majority of groups G, 
such that for each group g in G, the sum of votes from g is larger than half of 
the sum of weights in g. Interestingly, this construction enables smaller 
quorums. If we have, for example, 9 servers, we split them into 3 groups, and 
assign a weight of 1 to each server, then we are able to form quorums of size 
4. Note that two subsets of processes composed each of a majority of servers 
from each of a majority of groups necessarily have a non-empty intersection. It 
is reasonable to expect that a majority of co-locations will have a majority of 
servers available with high probability.

With ZooKeeper, we provide a user with the ability of configuring servers to 
use majority quorums, weights, or a hierarchy of groups.
"""

-----Original Message-----
From: Flavio Junqueira [mailto:f...@yahoo-inc.com] 
Sent: Saturday, July 25, 2009 7:55 AM
To: zookeeper-user@hadoop.apache.org
Subject: Re: Zookeeper WAN Configuration

Todd, you can try using flexible quorums to implementing what your  
requesting. You can simulate the behavior I described of observers by  
setting the weight of the server to zero. Please check the  
documentation at:

        http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html

Check under "Cluster Options" options like group and weight.

-Flavio


On Jul 24, 2009, at 5:03 PM, Todd Greenwood wrote:

>
> In the future, once the Observers feature is implemented, then we  
> should
> be able to deploy zk servers to both the DC and to the pods...with all
> the goodness that Flavio mentions below.
>
>
> -----Original Message-----
> From: Flavio Junqueira [mailto:f...@yahoo-inc.com]
> Sent: Friday, July 24, 2009 4:50 PM
> To: zookeeper-user@hadoop.apache.org
> Subject: Re: Zookeeper WAN Configuration
>
> Just a few quick observations:
>
> On Jul 24, 2009, at 4:40 PM, Ted Dunning wrote:
>
>> On Fri, Jul 24, 2009 at 4:23 PM, Todd Greenwood
>> <to...@audiencescience.com>wrote:
>>
>>> Could you explain the idea behind the Observers feature, what this
>>> concept is supposed to address, and how it applies to the WAN
>>> configuration problem in particular?
>>>
>>
>> Not really.  I am just echoing comments on observers from them that
>> know.
>>
>
> Without observers, increasing the number of servers in an ensemble
> enables higher read throughput, but causes write throughput to drop
> because the number of votes to order each write operation increases.
> Essentially, observers are zookeeper servers that don't vote when
> ordering updates to the zookeeper state. Adding observers enables
> higher read throughput affecting minimally write throughput (leader
> still has to send commits to everyone, at least in the version we have
> been working on).
>
>>
>>> """
>>> The ideas for federating ZK or allowing observers would likely do
>>> what
>>> you
>>> want.  I can imagine that an observer would only care that it can  
>>> see
>>> it's
>>> local peers and one of the observers would be elected to get updates
>>> (and
>>> thus would care about the central service).
>>> """
>>> This certainly sounds like exactly what I want...Was this
>>> introduced in
>>> 3.2 in full, or only partially?
>>>
>>
>> I don't think it is even in trunk yet.  Look on Jira or at the
>> recent logs
>> of this mailing list.
>
> It is not on trunk yet.
>
> -Flavio
>

Reply via email to