I was also wondering why our clients get disconnected in the first place
since the ZK servers are all up. The following are the logs when the first
disconnect happens. Does anyone know why the client can't seem to connect to
most servers? Also, is  "Session 0x1320765b0ac002e for server nulll" normal?
Thanks,

2011/08/29 07:33:51.824 INFO [ClientCnxn]
[main-SendThread(esv4-app27.stg:12913)] [kafka] Unable to read additional
data from server sessionid 0x1320765b0ac002f, likely server h
as closed socket, closing socket connection and attempting
reconnect2011/08/29 07:33:51.824 INFO [ClientCnxn]
[main-SendThread(esv4-app27.stg:12913)] [kafka] Unable to read additional
data from server sessionid 0x1320765b0ac002e, likely server h
as closed socket, closing socket connection and attempting reconnect
2011/08/29 07:33:51.990 INFO [ZkClient] [main-EventThread] [kafka] zookeeper
state changed (Disconnected)2011/08/29 07:33:52.019 INFO [ZkClient]
[main-EventThread] [kafka] zookeeper state changed (Disconnected)
2011/08/29 07:33:52.092 INFO [ClientCnxn]
[main-SendThread(esv4-app27.stg:12913)] [kafka] Opening socket connection to
server esv4-app29.stg/172.18.98.89:12913
2011/08/29 07:33:52.093 INFO [ClientCnxn]
[main-SendThread(esv4-app29.stg:12913)] [kafka] Socket connection
established to esv4-app29.stg/172.18.98.89:12913, initiating session
2011/08/29 07:33:52.094 INFO [ClientCnxn]
[main-SendThread(esv4-app29.stg:12913)] [kafka] Unable to read additional
data from server sessionid 0x1320765b0ac002f, likely server h
as closed socket, closing socket connection and attempting reconnect
2011/08/29 07:33:52.652 INFO [ClientCnxn]
[main-SendThread(esv4-app27.stg:12913)] [kafka] Opening socket connection to
server esv4-app28.stg/172.18.98.101:12913
2011/08/29 07:33:52.652 INFO [ClientCnxn]
[main-SendThread(esv4-app28.stg:12913)] [kafka] Socket connection
established to esv4-app28.stg/172.18.98.101:12913, initiating
session2011/08/29 07:33:53.075 INFO [ClientCnxn]
[main-SendThread(esv4-app28.stg:12913)] [kafka] Unable to read additional
data from server sessionid 0x1320765b0ac002e, likely server h
as closed socket, closing socket connection and attempting reconnect
2011/08/29 07:33:53.108 INFO [ClientCnxn]
[main-SendThread(esv4-app29.stg:12913)] [kafka] Opening socket connection to
server esv4-app28.stg/172.18.98.101:12913
2011/08/29 07:33:53.108 INFO [ClientCnxn]
[main-SendThread(esv4-app28.stg:12913)] [kafka] Socket connection
established to esv4-app28.stg/172.18.98.101:12913, initiating session
2011/08/29 07:33:53.109 INFO [ClientCnxn]
[main-SendThread(esv4-app28.stg:12913)] [kafka] Unable to read additional
data from server sessionid 0x1320765b0ac002f, likely server h
as closed socket, closing socket connection and attempting reconnect
2011/08/29 07:33:53.577 INFO [ClientCnxn]
[main-SendThread(esv4-app28.stg:12913)] [kafka] Opening socket connection to
server esv4-app30.stg/172.18.98.90:12913
2011/08/29 07:33:53.578 WARN [ClientCnxn]
[main-SendThread(esv4-app30.stg:12913)] [kafka] Session 0x1320765b0ac002e
for server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2011/08/29 07:33:53.683 INFO [ClientCnxn]
[main-SendThread(esv4-app28.stg:12913)] [kafka] Opening socket connection to
server esv4-app30.stg/172.18.98.90:12913
2011/08/29 07:33:53.683 WARN [ClientCnxn]
[main-SendThread(esv4-app30.stg:12913)] [kafka] Session 0x1320765b0ac002f
for server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)

Jun

On Mon, Aug 29, 2011 at 10:50 AM, Fournier, Camille F. <
camille.fourn...@gs.com> wrote:

> Well, it causes the problem you are seeing. If you set any watchers with a
> chroot and then your client gets disconnected with these watches
> outstanding, when you reconnect you will try to reset them and they are
> probably on paths that don't exist (if you are creating everything under
> path /kafka-tracking). So you get a notification about the watches
> immediately after resetting them, which causes the string out of bounds
> exception.
>
> The only fix is to disable auto watch reset, and then have your own client
> reset watches when it gets a reconnected event. I suspect it would be easier
> for you to take a shot at fixing the bug than to rewrite your client to
> handle this. Thomas provided a patch with tests that presumably show the
> error, so all you need is a fix to make them pass.
>
>
> C
>
> -----Original Message-----
> From: Jun Rao [mailto:jun...@gmail.com]
> Sent: Monday, August 29, 2011 12:39 PM
> To: user@zookeeper.apache.org; tho...@koch.ro
> Subject: Re: zk keeps disconnecting and reconnecting
>
> What's the impact of ZOOKEEPER-961? If it shows up, does that mean the
> client won't get any watcher events afterwards? If so, this sounds like a
> blocker for 3.4 release to me. What's the temporary solution for 3.3.3?
>
> Also, for the very first time that the ZK client gets disconnected, I saw
> the following entry in the log. It seems that the client can't ping the
> server for 4 seconds. The ZK server was up at that time and the load was
> minimal. What could cause the time out? Client GC pauses?
>
> 2011/08/26 10:58:22.306 INFO [ClientCnxn]
> [main-SendThread(esv4-app27.stg:12913)] [kafka] Client session timed out,
> have not heard from server in 4001ms for sessionid 0x131f
> ddd84bc0006, closing socket connection and attempting reconnect
>
> Thanks,
>
> Jun
>
> On Mon, Aug 29, 2011 at 7:54 AM, Thomas Koch <tho...@koch.ro> wrote:
>
> > Fournier, Camille F.:
> > > Did anyone ever check resetting watches at client reconnect on a client
> > > with a chroot? Looking at the code, we store the watches associated
> with
> > > the non-chroot path, but they are set by the original request
> prepending
> > > chroot to the request. However, it looks like the SetWatches request on
> > > reconnect just calls get on the various watch lists from ZooKeeper,
> which
> > > don't have the prepended chroot.
> > >
> > > I haven't written a test but I would bet dollars to donuts this is the
> > > problem.
> > >
> > > C
> > seems to be this:
> > ZOOKEEPER-961, ZOOKEEPER-1091
> >
> > Regards,
> >
> > Thomas Koch, http://www.koch.ro
> >
>

Reply via email to