The server logs don't say anything. I do have a theory based on reading the code, specifically the SendThread class within ClientCnxn.java
It took me a while to figure that its the client that sends the ping due to the error message being "have not heard from the *server *..." Once I got past that the key line in the code is: int timeToNextPing = readTimeout / 2 - clientCnxnSocket.getIdleSend() This basically means that the client will get at most 2 tries to send the ping within the timeout interval, no matter what you set the timeout value to. In a lossy network this may be insufficient...as can be seen from my client logs where I can go 30 seconds without sending a ping. I'm running a test now where I've changed the "2" to a "4". I trade a tiny increase in network traffic for a much higher chance of getting a successful ping even in a bad network environment. Brian On Mon, Feb 25, 2013 at 11:56 AM, Camille Fournier <[email protected]>wrote: > What do your server logs say during this time? > > > On Mon, Feb 25, 2013 at 11:51 AM, Brian Tarbox <[email protected] > >wrote: > > > I am getting the dreaded message: > > > > 10:59:45,871 INFO [org.apache.zookeeper.ClientCnxn] - <Client session > > timed out, have not heard from server in 31482ms for sessionid > > 0x13d11dd08160007, closing socket connection and attempting reconnect> > > > > and from looking at the logs it certainly seems that the keep alive > > messages are sometime just not being sent. > > > > In my case I see a bunch of these: > > 10:58:00,164 DEBUG [org.apache.zookeeper.ClientCnxn] - <Got ping response > > for sessionid: 0x13d11dd08160007 after 0ms> > > 10:58:13,511 DEBUG [org.apache.zookeeper.ClientCnxn] - <Got ping response > > for sessionid: 0x13d11dd08160007 after 0ms> > > 10:58:26,857 DEBUG [org.apache.zookeeper.ClientCnxn] - <Got ping response > > for sessionid: 0x13d11dd08160007 after 0ms> > > 10:58:40,205 DEBUG [org.apache.zookeeper.ClientCnxn] - <Got ping response > > for sessionid: 0x13d11dd08160007 after 0ms> > > 10:59:14,140 DEBUG [org.apache.zookeeper.ClientCnxn] - <Got ping response > > for sessionid: 0x13d11dd08160007 after 0ms> > > > > But then nothing from 10:59:14 until 10:59:45 when my client decides its > > been too long and so times out. > > > > I'm running 3.4.5 on EC2 ...any suggestions welcome. > > > > Thanks. > > > > Brian Tarbox > > -- > > http://about.me/BrianTarbox > > > -- http://about.me/BrianTarbox
