Hi Chang, thanks for the insights, if you have a few minutes would you
mind updating the FAQ with some of this detail?
http://wiki.apache.org/hadoop/ZooKeeper/FAQ

Thanks!

Patrick

On Thu, Nov 4, 2010 at 6:27 AM, Chang Song <tru64...@me.com> wrote:
>
> Sorry. I made a mistake on retry timeout in load balancer section of my 
> answer.
> The same timeout applies to load balancer case as well (depends on the recv
> timeout)
>
> Thank you
>
> Chang
>
>
> On Nov 4, 2010, at 10:22 PM, Chang Song wrote:
>
>>
>> I would like to add some info on this.
>>
>> This may not be very important, but there are subtle differences.
>>
>> Two cases:  1. server hardware failure or kernel panic
>>                      2. zookeeper Java daemon process down
>>
>> In former one, timeout will be based on the timeout argument in 
>> zookeeper_init().
>> Partially based on ZK heartbeat algorithm. It recognize server down in 2/3 
>> of the timeout.
>> then retries at every timeout. For example, if timeout is 9000 msec, it
>> first times out in 6 second, and retries every 9 seconds.
>>
>> In latter case (Java process down), since socket connect immediately returns
>> refused connection, it can retry immediately.
>>
>> On top of that,
>>
>> - Hardware load balancer:
>> If an ensemble cluster is serviced with hardware load balancer,
>> zookeeper client will retry every 2 second since we only have one IP to try.
>>
>> - DNS RR:
>> Make sure that "nscd" on your linux box is off since it is most likely that 
>> DNS cache returns the same IP many times.
>> This is actually worse than above since ZK client will retry the same dead 
>> server every 2 seconds for some time.
>>
>>
>> I think it is best not to use load balancer for ZK clients since ZK clients 
>> will try next server immediately
>> if previous one fails for some reason (based on timeout above). And this is 
>> especially true if your cluster works in
>> pseudo realtime environment where tickTime is set to very low.
>>
>>
>> Chang
>>
>>
>> On Nov 4, 2010, at 9:17 AM, Ted Dunning wrote:
>>
>>> DNS round-robin works as well.
>>>
>>> On Wed, Nov 3, 2010 at 3:45 PM, Benjamin Reed <br...@yahoo-inc.com> wrote:
>>>
>>>> it would have to be a TCP based load balancer to work with ZooKeeper
>>>> clients, but other than that it should work really well. The clients will 
>>>> be
>>>> doing heart beats so the TCP connections will be long lived. The client
>>>> library does random connection load balancing anyway.
>>>>
>>>> ben
>>>>
>>>> On 11/03/2010 12:19 PM, Luka Stojanovic wrote:
>>>>
>>>>> What would be expected behavior if a three node cluster is put behind a
>>>>> load
>>>>> balancer? It would ease deployment because all clients would be configured
>>>>> to target zookeeper.example.com regardless of actual cluster
>>>>> configuration,
>>>>> but I have impression that client-server connection is stateful and that
>>>>> jumping randomly from server to server could bring strange behavior.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> --
>>>>> Luka Stojanovic
>>>>> lu...@vast.com
>>>>> Platform Engineering
>>>>>
>>>>
>>>>
>>
>
>

Reply via email to