Hey Ted, The client library already handles all of this for you. Its not very clear, but the `host` parameter to: > ZooKeeper(String host, int sessionTimeout, Watcher watcher) ... takes a comma separated list of server:port pairs, which should be the full list of servers in your quorum.
Assuming that the crashed server doesn't bring you below the minimum number of servers required for quorum, the client library will connect to another server in the list, and all of your Watchers will receive a 'connected' event, indicating that they changed servers. See: http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_gotchas Also, the new Apache ZooKeeper mailing list is here: http://hadoop.apache.org/zookeeper/mailing_lists.html Thanks, Stu -----Original Message----- From: "Ted Dunning" <[EMAIL PROTECTED]> Sent: Wednesday, October 15, 2008 5:01am To: [EMAIL PROTECTED] Subject: [Zookeeper-user] how to handle zookeeper server crash from client ------------------------------------------------------------------------- What is the received wisdom about how to handle the crash of a ZK server from the client. Clearly, reconnecting is necessary. Should that be done by the client by just using a watcher that probes a list of servers until it finds a live one? For that matter, what is best practice relative to initial connection? Are people using a load balancer to abstract away how many servers are in the zookeeper cluster? Or are they writing application code to probe the cluster until a live server is found? -- ted