Hi Chang, thanks for the insights, if you have a few minutes would you mind updating the FAQ with some of this detail? http://wiki.apache.org/hadoop/ZooKeeper/FAQ
Thanks! Patrick On Thu, Nov 4, 2010 at 6:27 AM, Chang Song <tru64...@me.com> wrote: > > Sorry. I made a mistake on retry timeout in load balancer section of my > answer. > The same timeout applies to load balancer case as well (depends on the recv > timeout) > > Thank you > > Chang > > > On Nov 4, 2010, at 10:22 PM, Chang Song wrote: > >> >> I would like to add some info on this. >> >> This may not be very important, but there are subtle differences. >> >> Two cases: 1. server hardware failure or kernel panic >> 2. zookeeper Java daemon process down >> >> In former one, timeout will be based on the timeout argument in >> zookeeper_init(). >> Partially based on ZK heartbeat algorithm. It recognize server down in 2/3 >> of the timeout. >> then retries at every timeout. For example, if timeout is 9000 msec, it >> first times out in 6 second, and retries every 9 seconds. >> >> In latter case (Java process down), since socket connect immediately returns >> refused connection, it can retry immediately. >> >> On top of that, >> >> - Hardware load balancer: >> If an ensemble cluster is serviced with hardware load balancer, >> zookeeper client will retry every 2 second since we only have one IP to try. >> >> - DNS RR: >> Make sure that "nscd" on your linux box is off since it is most likely that >> DNS cache returns the same IP many times. >> This is actually worse than above since ZK client will retry the same dead >> server every 2 seconds for some time. >> >> >> I think it is best not to use load balancer for ZK clients since ZK clients >> will try next server immediately >> if previous one fails for some reason (based on timeout above). And this is >> especially true if your cluster works in >> pseudo realtime environment where tickTime is set to very low. >> >> >> Chang >> >> >> On Nov 4, 2010, at 9:17 AM, Ted Dunning wrote: >> >>> DNS round-robin works as well. >>> >>> On Wed, Nov 3, 2010 at 3:45 PM, Benjamin Reed <br...@yahoo-inc.com> wrote: >>> >>>> it would have to be a TCP based load balancer to work with ZooKeeper >>>> clients, but other than that it should work really well. The clients will >>>> be >>>> doing heart beats so the TCP connections will be long lived. The client >>>> library does random connection load balancing anyway. >>>> >>>> ben >>>> >>>> On 11/03/2010 12:19 PM, Luka Stojanovic wrote: >>>> >>>>> What would be expected behavior if a three node cluster is put behind a >>>>> load >>>>> balancer? It would ease deployment because all clients would be configured >>>>> to target zookeeper.example.com regardless of actual cluster >>>>> configuration, >>>>> but I have impression that client-server connection is stateful and that >>>>> jumping randomly from server to server could bring strange behavior. >>>>> >>>>> Cheers, >>>>> >>>>> -- >>>>> Luka Stojanovic >>>>> lu...@vast.com >>>>> Platform Engineering >>>>> >>>> >>>> >> > >