Looks like there's an opened bug for the described issue: https://issues.apache.org/jira/browse/ZOOKEEPER-832
There was some discussion in the comments but looks like the best solution hasn't been found yet. Yuriy 2015-04-22 18:55 GMT-04:00 Yuriy Lopotun <[email protected]>: > Hi guys, > > > > In our client-server OSGI application we are using ECF Zookeeper-based > discovery provider for remote services discovery (based on Zookeeper > v.3.3.6). > > In a standalone mode the plugin opens a dedicated Zookeeper connection > from the client to each of the servers. > > > When testing the application resiliency, we noticed that when we restart > the server, the connection never gets re-established. In the server logs I > found the following: > > 2015-04-22 18:20:53,763 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO > org.apac.zook.serv.NIOServerCnxn - Accepted socket connection from / > 10.36.64.250:53022 > > 2015-04-22 18:20:53,763 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] DEBUG > org.apac.zook.serv.NIOServerCnxn - Session establishment request from > client /10.36.64.250:53022 client's lastZxid is 0x8 > > 2015-04-22 18:20:53,764 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO > org.apac.zook.serv.NIOServerCnxn - Refusing session request for client / > 10.36.64.250:53022 as it has seen zxid 0x8 our last zxid is 0x7 client > must try another server > > 2015-04-22 18:20:53,764 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO > org.apac.zook.serv.NIOServerCnxn - Closed socket connection for client / > 10.36.64.250:53022 (no session established for client) > > > > As far as I understood – this is an expected behaviour, since the server > (due to restart) cleaned up its DB and reset the transaction id. > > > The problem in this case is that the client session keeps trying > re-connecting to this only server, which causes an infinite loop: > > 2015-04-22 18:21:02,760 [pool-2-thread-3-SendThread( > ca-rd-mbernard.miranda.com:2001)] INFO org.apac.zook.ClientCnxn - > Opening socket connection to server > ca-rd-mbernard.miranda.com/10.36.64.250:2001 > > 2015-04-22 18:21:02,761 [pool-2-thread-3-SendThread( > ca-rd-mbernard.miranda.com:2001)] INFO org.apac.zook.ClientCnxn - Socket > connection established to ca-rd-mbernard.miranda.com/10.36.64.250:2001, > initiating session > > 2015-04-22 18:21:02,761 [pool-2-thread-3-SendThread( > ca-rd-mbernard.miranda.com:2001)] DEBUG org.apac.zook.ClientCnxn - > Session establishment request sent on > ca-rd-mbernard.miranda.com/10.36.64.250:2001 > > 2015-04-22 18:21:02,762 [pool-2-thread-3-SendThread( > ca-rd-mbernard.miranda.com:2001)] INFO org.apac.zook.ClientCnxn - Unable > to read additional data from server sessionid 0x14ce32e178c0002, likely > server has closed socket, closing socket connection and attempting reconnect > > > > Again, I think this is a correct behaviour in case of several servers. But > in our case – it’s always 1. > > So, I wanted to ask you for a suggestion: what you think we can do in this > case to achieve automatic reconnect. > > I thought, maybe we can close the connection in case of such exception if > there is only 1 server instead of retrying? Maybe this enhancement is > already done in more recent versions and could be back-ported? > > > > Thanks, > > Yuriy >
