Well, it causes the problem you are seeing. If you set any watchers with a chroot and then your client gets disconnected with these watches outstanding, when you reconnect you will try to reset them and they are probably on paths that don't exist (if you are creating everything under path /kafka-tracking). So you get a notification about the watches immediately after resetting them, which causes the string out of bounds exception.
The only fix is to disable auto watch reset, and then have your own client reset watches when it gets a reconnected event. I suspect it would be easier for you to take a shot at fixing the bug than to rewrite your client to handle this. Thomas provided a patch with tests that presumably show the error, so all you need is a fix to make them pass. C -----Original Message----- From: Jun Rao [mailto:[email protected]] Sent: Monday, August 29, 2011 12:39 PM To: [email protected]; [email protected] Subject: Re: zk keeps disconnecting and reconnecting What's the impact of ZOOKEEPER-961? If it shows up, does that mean the client won't get any watcher events afterwards? If so, this sounds like a blocker for 3.4 release to me. What's the temporary solution for 3.3.3? Also, for the very first time that the ZK client gets disconnected, I saw the following entry in the log. It seems that the client can't ping the server for 4 seconds. The ZK server was up at that time and the load was minimal. What could cause the time out? Client GC pauses? 2011/08/26 10:58:22.306 INFO [ClientCnxn] [main-SendThread(esv4-app27.stg:12913)] [kafka] Client session timed out, have not heard from server in 4001ms for sessionid 0x131f ddd84bc0006, closing socket connection and attempting reconnect Thanks, Jun On Mon, Aug 29, 2011 at 7:54 AM, Thomas Koch <[email protected]> wrote: > Fournier, Camille F.: > > Did anyone ever check resetting watches at client reconnect on a client > > with a chroot? Looking at the code, we store the watches associated with > > the non-chroot path, but they are set by the original request prepending > > chroot to the request. However, it looks like the SetWatches request on > > reconnect just calls get on the various watch lists from ZooKeeper, which > > don't have the prepended chroot. > > > > I haven't written a test but I would bet dollars to donuts this is the > > problem. > > > > C > seems to be this: > ZOOKEEPER-961, ZOOKEEPER-1091 > > Regards, > > Thomas Koch, http://www.koch.ro >
