Are you actually remaking the connection? It should happen automatically for you.
On Wed, Jun 8, 2011 at 11:35 AM, Sampath Perera <[email protected]> wrote: > Hi, > > First of all I must appreciate ZooKeeper, where I was able to get going with > it pretty fast and implemented clustering (coordination of nodes in the > cluster) for our product (UltraESB) just by going through the documentation > and a few searches of the mailing list. > > Now, I was trying to run a sample setup with a ZooKeeper quorum of 3 nodes. > I have setup the ZooKeeper quorum locally on the localhost with giving > different election ports and client ports, and it seems to be like the > quorum is working fine. Then I have started 3 UltraESB server nodes pointing > to the quorum, I have noticed that a given UltraESB node connected to a > particular ZooKeeper node. Then to test the reliability, I have tried to > stop a ZooKeeper instance so that the 2 out of 3 ZK nodes are still alive, > and the quorum has to work. > > What I have noticed when ever I stop the ZK node, the ESB server attached to > that node, gets a Discconected keeper state watched event, (upon receiving > this event I have registered a handler to stop the ESB cluster manager as > this means the ZK connection was lost). Now I do not see ZK client trying to > re-create the session with another node in the quorum...? > > Could it be due to some problem in the way I have implemented the watched > event processing? or do we manually need to re-connect to the quorum once we > receive a Disconnected event? > > Further I have been using ephemeral nodes, and I want to get the same > session, so I have tried to re-create the ZK session with creating a new ZK > instance from the ESB (client) side with passing the previous session id and > the session paswd, this caused the other 2 ESB servers to receive > Disconnected events too, but still I noticed that the ZK quorum was running > fine with the 2 nodes that it had up and running and those 2 nodes got into > a infinite loop due to the disconnect and then me trying to recreate ZK > session and soon the system received "Too many open files error" probably > due to running out of files with opened sockets (I am on unix) > > Any help in understanding this quorum re-connection would be really > appreciated? Is there any documentation for this? If there is any please > bare with me and point to the documentation. > > -- > Thanks, > Sampath > http://adroitlogic.org >
