You should never see connection loss except in the case where you have
some network partition or some other issue that causes communication
issues btw the client and server. (client swapping? server swapping or
either having GC pause issues? etc...) Are you monitoring your
hosts/network/jvms, etc..? "over virtualization" of the cluster hosts?
Take a look at your client/server logs and see if you can determine what
the issue is. You might also try using some network level tools like
ping/ssh to verify connectivity btw server/client. See this page for
issues ppl have had in the past:
For example "Hardware misconfiguration - NIC" caused one system to
basically work, but with huge numbers of connection loss, esp whenever
there was load (and I've seen this particular issue twice now).
Michael Bauland wrote:
thanks for your reply.
This page: about Zookeeper error
I actually read this page before. You may have misunderstood my
question. I know how to recover from the connectionloss exception. I was
just curious why it occurred so often in my described scenario. I would
have assumed that in that scenario it shouldn't occur at all, but it was
almost half of the requests that returned with a connectionloss.
On Mon, Feb 1, 2010 at 4:30 AM, Michael Bauland <michael.baul...@knipp.de>wrote:
I've got a question regarding the connectionloss exception thrown by Java.
I've got an ensemble running with three zk servers. If one of the three
servers is not running, the whole ensemble should still work (and it
does, so that's fine). But in this situation I experience quite often a
connectionloss exception and I'm wondering if I'm doing something wrong
or if that's to be expected.
My Code is rather simple:
I create a new connection to my ensemble using
ZooKeeper zk = new ZooKeeper (connectString, timeOut, new MyWatcher ());
where connectString contains all three servers. Then I use the ZooKeeper
to read data from a certain path:
zk.getData (path, false, null);
This call quite often returns an exception like
KeeperErrorCode = ConnectionLoss for /125/170/test
But according to your documentation, the connectionloss exception should
only occur in the following two cases:
1. The application calls an operation on a session that is no longer
This should not be the case, since I only just created the session.
2. The ZooKeeper client disconnects from a server when there are
pending operations to that server, i.e., there is a pending asynchronous
The should also not be the case. I was just doing a read request and no
other client was accessing the ensemble.
My only idea is that maybe the connection call first tried to connect to
the zookeeper server that was not running (remember only two of the
three servers are running) and before it had a chance to try to connect
to one of the other servers, my getData call was made and failed with
connectionloss. Could that be the reason?
But I thought the connection handling was automatic and if a connection
failed the client would automatically try any of the other listed
servers without the user noticing!?
Thanks for any help.