"100mb partition"? sounds like virtualization. resource starvation (worse in virtualized env) is a common cause of this. Are your clients gcing/swapping at all? If a client gc's for long periods of time the heartbeat thread won't be able to run and the server will expire the session. There is a min/max cap that the server places on the client timeouts (it's negotiated), check the client log for detail on what timeout it negotiated (logged in 3.3 releases)

take a look at this and see if you can make progress:

My guess is that your client is gcing for long periods of time - you can rule this in/out by turning on gc logging in your clients and then viewing the results after another such incident happens (try gchisto for graphical view)


On 06/09/2010 11:36 AM, Jordan Zimmerman wrote:
We have a test system using Zookeeper. There is a single Zookeeper
server node and 4 clients. There is very little activity in this
system. After a day's testing we start to see SessionExpiredException
on the client. Things I've tried:

* Increasing the session timeout to 1 minute * Making sure all JVMs
are running in a 100MB partition

Any help debugging this problem would be appreciated. What kind of
diagnostics should can I add? Are there more config parameters that I
should try?


