Hi, First of all I must appreciate ZooKeeper, where I was able to get going with it pretty fast and implemented clustering (coordination of nodes in the cluster) for our product (UltraESB) just by going through the documentation and a few searches of the mailing list.
Now, I was trying to run a sample setup with a ZooKeeper quorum of 3 nodes. I have setup the ZooKeeper quorum locally on the localhost with giving different election ports and client ports, and it seems to be like the quorum is working fine. Then I have started 3 UltraESB server nodes pointing to the quorum, I have noticed that a given UltraESB node connected to a particular ZooKeeper node. Then to test the reliability, I have tried to stop a ZooKeeper instance so that the 2 out of 3 ZK nodes are still alive, and the quorum has to work. What I have noticed when ever I stop the ZK node, the ESB server attached to that node, gets a Discconected keeper state watched event, (upon receiving this event I have registered a handler to stop the ESB cluster manager as this means the ZK connection was lost). Now I do not see ZK client trying to re-create the session with another node in the quorum...? Could it be due to some problem in the way I have implemented the watched event processing? or do we manually need to re-connect to the quorum once we receive a Disconnected event? Further I have been using ephemeral nodes, and I want to get the same session, so I have tried to re-create the ZK session with creating a new ZK instance from the ESB (client) side with passing the previous session id and the session paswd, this caused the other 2 ESB servers to receive Disconnected events too, but still I noticed that the ZK quorum was running fine with the 2 nodes that it had up and running and those 2 nodes got into a infinite loop due to the disconnect and then me trying to recreate ZK session and soon the system received "Too many open files error" probably due to running out of files with opened sockets (I am on unix) Any help in understanding this quorum re-connection would be really appreciated? Is there any documentation for this? If there is any please bare with me and point to the documentation. -- Thanks, Sampath http://adroitlogic.org
