Ok. So the only difference between a network partition failure and a zookeeper 
server cluster bounce that I can see from the client side is that in former 
case the ConnectionLossException happens on a ZooKeeper client where the state 
is CONNECTED and in the later it's CONNECTING. Is this a reliable means of 
determining I should recreate the client state from scratch?

-----Original Message-----
From: Carroll James (Nokia-LC/Malvern) [mailto:[email protected]]
Sent: Wednesday, November 28, 2012 12:18 PM
To: [email protected]
Subject: RE: Unrecoverable ConnectionLossException after server restart

This is apparently happening because the session establishment is being 
rejected on the server side:

2012-11-28 12:13:04,102 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:54551] INFO  
ZooKeeperServer - Refusing session request for client /127.0.0.1:38095 as it 
has seen zxid 0x2 our last zxid is 0x0 client must try another server

Unfortunately I can't see any indication on the client side that this is the 
problem. The server just decides to sever the connection and the client just 
keeps retrying (hence the counting up on the ephemeral ports). I could deal 
with this in the application if I could tell why the server decided to close 
the connection. Is there a way for me to do this?

Thanks
Jim

-----Original Message-----
From: Carroll James (Nokia-LC/Malvern) [mailto:[email protected]]
Sent: Wednesday, November 28, 2012 1:19 AM
To: [email protected]
Subject: Unrecoverable ConnectionLossException after server restart

I'm seeing (what I think) is incorrect behavior from ZooKeeper.

When I start a client, connect to a server, and then restart the server, the 
client (I thought) was supposed to eventually reconnect. It doesn't. It 
continually throws a ConnectionLossException on every use, the ZooKeeper client 
isAlive is true, I never get a SESSION_EXPIRATION, and I can see the client 
side ephemeral ports listed in the error message counting up as if it's 
continually attempting to reconnect.

If I recreate the ZooKeeper client, the new client connects and I can use it.

So I could simply react as if I got a SESSION_EXPIRATION exception and rebuild 
the client state, except the a ConnectionLossException is something I ALSO get 
when I get a network partition. When I periodically recreate the entire client 
from scratch in response to a ConnectionLossException I eventually run out of 
file descriptors and my entire process is hosed. This seems to be related to 
the use of nio and the repeated opening of pipes and anon_inodes (which show up 
in an lsof).

Am I doing something wrong? Any suggestions?

The information contained in this communication may be CONFIDENTIAL and is 
intended only for the use of the recipient(s) named above.  If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution, or copying of this communication, or any of its contents, is 
strictly prohibited.  If you have received this communication in error, please 
notify the sender and delete/destroy the original message and any copy of it 
from your computer or paper files.

The information contained in this communication may be CONFIDENTIAL and is 
intended only for the use of the recipient(s) named above.  If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution, or copying of this communication, or any of its contents, is 
strictly prohibited.  If you have received this communication in error, please 
notify the sender and delete/destroy the original message and any copy of it 
from your computer or paper files.

The information contained in this communication may be CONFIDENTIAL and is 
intended only for the use of the recipient(s) named above.  If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution, or copying of this communication, or any of its contents, is 
strictly prohibited.  If you have received this communication in error, please 
notify the sender and delete/destroy the original message and any copy of it 
from your computer or paper files.

Reply via email to