Thanks for you input.
I am planning on having 3 zk servers per data centre, with perhaps only 2 in
the tie-breaker site.
The traffic between zk and the applications will be lots of local reads -
"who is the primary database ?". Changes to the config will be rare (server
rebuilds, etc - ie. planned changes) or caused by server / network / site
The interesting thing in my mind is how zookeeper will cope with inter-site
link failure - how quickly the remote sites will notice, and how quickly
normality can be resumed when the link reappears.
I need to get this running in the lab and start pulling out wires.
On 8 March 2010 17:39, Patrick Hunt <ph...@apache.org> wrote:
> IMO latency is the primary issue you will face, but also keep in mind
> reliability w/in a colo.
> Say you have 3 colos (obv can't be 2), if you only have 3 servers, one in
> each colo, you will be reliable but clients w/in each colo will have to
> connect to a remote colo if the local fails. You will want to prioritize the
> local colo given that reads can be serviced entirely local that way. If you
> have 7 servers (2-2-3) that would be better - if a local server fails you
> have a redundant, if both fail then you go remote.
> You want to keep your writes as few as possible and as small as possible?
> Why? Say you have 100ms latency btw colos, let's go through a scenario for a
> client in a colo where the local servers are not the leader (zk cluster
> 1) client reads a znode from local server
> 2) local server (usually < 1ms if "in colo" comm) responds in 1ms
> 1) client writes a znode to local server A
> 2) A proposes change to the ZK Leader (L) in remote colo
> 3) L gets the proposal in 100ms
> 4) L proposes the change to all followers
> 5) all followers (not exactly, but hopefully) get the proposal in 100ms
> 6) followers ack the change
> 7) L gets the acks in 100ms
> 8) L commits the change (message to all followers)
> 9) A gets the commit in 100ms
> 10) A responds to client (< 1ms)
> write latency: 100 + 100 + 100 + 100 = 400ms
> Obviously keeping these writes small is also critical.
> Martin Waite wrote:
>> Hi Ted,
>> If the links do not work for us for zk, then they are unlikely to work
>> any other solution - such as trying to stretch Pacemaker or Red Hat
>> with their multicast protocols across the links.
>> If the links are not good enough, we might have to spend some more money
>> fix this.
>> On 8 March 2010 02:14, Ted Dunning <ted.dunn...@gmail.com> wrote:
>> If you can stand the latency for updates then zk should work well for
>>> It is unlikely that you will be able to better than zk does and still
>>> maintain correctness.
>>> Do note that you can, probalbly bias client to use a local server. That
>>> should make things more efficient.
>>> Sent from my iPhone
>>> On Mar 7, 2010, at 3:00 PM, Mahadev Konar <maha...@yahoo-inc.com> wrote:
>>> The inter-site links are a nuisance. We have two data-centres with
>>>> links which I hope would be good enough for most uses, but we need a 3rd
>>>>> site - and currently that only has 2Mb links to the other sites. This
>>>>> be a problem.