There's a test also attached to 961. It would be nice if one of you that is having trouble with the bug would submit a fix. Is anyone working on it?
Thanks, C -----Original Message----- From: kishore g [mailto:[email protected]] Sent: Wednesday, August 31, 2011 1:02 PM To: [email protected] Subject: Re: zk keeps disconnecting and reconnecting Here is a simple test case that reproduces this error. public void testChroot() throws Exception { Watcher watcher = new Watcher() { @Override public void process(WatchedEvent event) { System.out.println("Event:" + event); } }; ZooKeeper zk = new ZooKeeper("localhost:2181/foo", 6000, watcher); //uncommenting this line will not cause infinite connect/disconnect //zk.create("/", new byte[0], Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); zk.exists("/", true); System.out.println("Stop the server and restart it when you see this message"); Thread.currentThread().join(); } As pointed out earlier in this thread, setting an watch on a non-existent path triggers this. Is some one working on a patch for 961 and the issues described in this thread. Any pointers on what needs to be fixed for both the issues? I can take a look and submit a patch if I can fix it. thanks, Kishore G On Tue, Aug 30, 2011 at 6:48 PM, Jun Rao <[email protected]> wrote: > I was also wondering why our clients get disconnected in the first place > since the ZK servers are all up. The following are the logs when the first > disconnect happens. Does anyone know why the client can't seem to connect > to > most servers? Also, is "Session 0x1320765b0ac002e for server nulll" > normal? > Thanks, > > 2011/08/29 07:33:51.824 INFO [ClientCnxn] > [main-SendThread(esv4-app27.stg:12913)] [kafka] Unable to read additional > data from server sessionid 0x1320765b0ac002f, likely server h > as closed socket, closing socket connection and attempting > reconnect2011/08/29 07:33:51.824 INFO [ClientCnxn] > [main-SendThread(esv4-app27.stg:12913)] [kafka] Unable to read additional > data from server sessionid 0x1320765b0ac002e, likely server h > as closed socket, closing socket connection and attempting reconnect > 2011/08/29 07:33:51.990 INFO [ZkClient] [main-EventThread] [kafka] > zookeeper > state changed (Disconnected)2011/08/29 07:33:52.019 INFO [ZkClient] > [main-EventThread] [kafka] zookeeper state changed (Disconnected) > 2011/08/29 07:33:52.092 INFO [ClientCnxn] > [main-SendThread(esv4-app27.stg:12913)] [kafka] Opening socket connection > to > server esv4-app29.stg/172.18.98.89:12913 > 2011/08/29 07:33:52.093 INFO [ClientCnxn] > [main-SendThread(esv4-app29.stg:12913)] [kafka] Socket connection > established to esv4-app29.stg/172.18.98.89:12913, initiating session > 2011/08/29 07:33:52.094 INFO [ClientCnxn] > [main-SendThread(esv4-app29.stg:12913)] [kafka] Unable to read additional > data from server sessionid 0x1320765b0ac002f, likely server h > as closed socket, closing socket connection and attempting reconnect > 2011/08/29 07:33:52.652 INFO [ClientCnxn] > [main-SendThread(esv4-app27.stg:12913)] [kafka] Opening socket connection > to > server esv4-app28.stg/172.18.98.101:12913 > 2011/08/29 07:33:52.652 INFO [ClientCnxn] > [main-SendThread(esv4-app28.stg:12913)] [kafka] Socket connection > established to esv4-app28.stg/172.18.98.101:12913, initiating > session2011/08/29 07:33:53.075 INFO [ClientCnxn] > [main-SendThread(esv4-app28.stg:12913)] [kafka] Unable to read additional > data from server sessionid 0x1320765b0ac002e, likely server h > as closed socket, closing socket connection and attempting reconnect > 2011/08/29 07:33:53.108 INFO [ClientCnxn] > [main-SendThread(esv4-app29.stg:12913)] [kafka] Opening socket connection > to > server esv4-app28.stg/172.18.98.101:12913 > 2011/08/29 07:33:53.108 INFO [ClientCnxn] > [main-SendThread(esv4-app28.stg:12913)] [kafka] Socket connection > established to esv4-app28.stg/172.18.98.101:12913, initiating session > 2011/08/29 07:33:53.109 INFO [ClientCnxn] > [main-SendThread(esv4-app28.stg:12913)] [kafka] Unable to read additional > data from server sessionid 0x1320765b0ac002f, likely server h > as closed socket, closing socket connection and attempting reconnect > 2011/08/29 07:33:53.577 INFO [ClientCnxn] > [main-SendThread(esv4-app28.stg:12913)] [kafka] Opening socket connection > to > server esv4-app30.stg/172.18.98.90:12913 > 2011/08/29 07:33:53.578 WARN [ClientCnxn] > [main-SendThread(esv4-app30.stg:12913)] [kafka] Session 0x1320765b0ac002e > for server null, unexpected error, closing socket connection and attempting > reconnect > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119) > 2011/08/29 07:33:53.683 INFO [ClientCnxn] > [main-SendThread(esv4-app28.stg:12913)] [kafka] Opening socket connection > to > server esv4-app30.stg/172.18.98.90:12913 > 2011/08/29 07:33:53.683 WARN [ClientCnxn] > [main-SendThread(esv4-app30.stg:12913)] [kafka] Session 0x1320765b0ac002f > for server null, unexpected error, closing socket connection and attempting > reconnect > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119) > > Jun > > On Mon, Aug 29, 2011 at 10:50 AM, Fournier, Camille F. < > [email protected]> wrote: > > > Well, it causes the problem you are seeing. If you set any watchers with > a > > chroot and then your client gets disconnected with these watches > > outstanding, when you reconnect you will try to reset them and they are > > probably on paths that don't exist (if you are creating everything under > > path /kafka-tracking). So you get a notification about the watches > > immediately after resetting them, which causes the string out of bounds > > exception. > > > > The only fix is to disable auto watch reset, and then have your own > client > > reset watches when it gets a reconnected event. I suspect it would be > easier > > for you to take a shot at fixing the bug than to rewrite your client to > > handle this. Thomas provided a patch with tests that presumably show the > > error, so all you need is a fix to make them pass. > > > > > > C > > > > -----Original Message----- > > From: Jun Rao [mailto:[email protected]] > > Sent: Monday, August 29, 2011 12:39 PM > > To: [email protected]; [email protected] > > Subject: Re: zk keeps disconnecting and reconnecting > > > > What's the impact of ZOOKEEPER-961? If it shows up, does that mean the > > client won't get any watcher events afterwards? If so, this sounds like a > > blocker for 3.4 release to me. What's the temporary solution for 3.3.3? > > > > Also, for the very first time that the ZK client gets disconnected, I saw > > the following entry in the log. It seems that the client can't ping the > > server for 4 seconds. The ZK server was up at that time and the load was > > minimal. What could cause the time out? Client GC pauses? > > > > 2011/08/26 10:58:22.306 INFO [ClientCnxn] > > [main-SendThread(esv4-app27.stg:12913)] [kafka] Client session timed out, > > have not heard from server in 4001ms for sessionid 0x131f > > ddd84bc0006, closing socket connection and attempting reconnect > > > > Thanks, > > > > Jun > > > > On Mon, Aug 29, 2011 at 7:54 AM, Thomas Koch <[email protected]> wrote: > > > > > Fournier, Camille F.: > > > > Did anyone ever check resetting watches at client reconnect on a > client > > > > with a chroot? Looking at the code, we store the watches associated > > with > > > > the non-chroot path, but they are set by the original request > > prepending > > > > chroot to the request. However, it looks like the SetWatches request > on > > > > reconnect just calls get on the various watch lists from ZooKeeper, > > which > > > > don't have the prepended chroot. > > > > > > > > I haven't written a test but I would bet dollars to donuts this is > the > > > > problem. > > > > > > > > C > > > seems to be this: > > > ZOOKEEPER-961, ZOOKEEPER-1091 > > > > > > Regards, > > > > > > Thomas Koch, http://www.koch.ro > > > > > >
