Re: Misbehaving zk servers

2010-04-29 Thread Travis Crawford
On Thu, Apr 29, 2010 at 10:24 AM, Patrick Hunt wrote: > Did you find any bugs on java.sun.com related to those? ;-) > > That does sound like a good solution to me. We should stop accepting > connections and log it to the log as well. We might also want to update the > user docs and tell users to m

Re: Misbehaving zk servers

2010-04-29 Thread Patrick Hunt
Did you find any bugs on java.sun.com related to those? ;-) That does sound like a good solution to me. We should stop accepting connections and log it to the log as well. We might also want to update the user docs and tell users to monitor the FD count as part of their monitoring regime. Is t

Re: Misbehaving zk servers

2010-04-29 Thread Travis Crawford
On Thu, Apr 29, 2010 at 9:49 AM, Patrick Hunt wrote: > Is there any good (simple/fast/bulletproof) way to monitor the FD use inside > the jvm? If so we could stop accepting new client connections once we get > close to the os imposed limit... The test would have to be a bulletproof one > though -

Re: Misbehaving zk servers

2010-04-29 Thread Patrick Hunt
Is there any good (simple/fast/bulletproof) way to monitor the FD use inside the jvm? If so we could stop accepting new client connections once we get close to the os imposed limit... The test would have to be a bulletproof one though - we wouldn't want to end up in some worse situation (where

Re: Misbehaving zk servers

2010-04-29 Thread Mahadev Konar
Hi Travis, How many clients did you have connected to this server? Usually the default is 8K file descriptors. Did you have clients more than that? Also, if clients fail to attach to a server, they will run off to another server. We do not do any blacklisting because we expect the server to heal

Misbehaving zk servers

2010-04-29 Thread Travis Crawford
Hey zookeeper gurus - We recently had a zookeeper outage when one ZK server was started with a low limit after upgrading to 3.3.0. Several days later the outage occurred when that node reached its file descriptor limit and clients started having major issues. Are there any circumstances when a ZK