"100mb partition"? sounds like virtualization. resource starvation
(worse in virtualized env) is a common cause of this. Are your clients
gcing/swapping at all? If a client gc's for long periods of time the
heartbeat thread won't be able to run and the server will expire the
session. There is a min/max cap that the server places on the client
timeouts (it's negotiated), check the client log for detail on what
timeout it negotiated (logged in 3.3 releases)
take a look at this and see if you can make progress:
http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting
My guess is that your client is gcing for long periods of time - you can
rule this in/out by turning on gc logging in your clients and then
viewing the results after another such incident happens (try gchisto for
graphical view)
Patrick
On 06/09/2010 11:36 AM, Jordan Zimmerman wrote:
We have a test system using Zookeeper. There is a single Zookeeper
server node and 4 clients. There is very little activity in this
system. After a day's testing we start to see SessionExpiredException
on the client. Things I've tried:
* Increasing the session timeout to 1 minute * Making sure all JVMs
are running in a 100MB partition
Any help debugging this problem would be appreciated. What kind of
diagnostics should can I add? Are there more config parameters that I
should try?
-JZ