Re: Pathological ZK cluster: 1 server verbosely WARN'ing, other 2 servers pegging CPU

2010-05-12 Thread Aaron Crow
I may have a better idea of what caused the trouble. I way, WAY underestimated the number of nodes we collect over time. Right now we're at 1.9 million. This isn't a bug of our application; it's actually a feature (but perhaps an ill-conceived one). A most recent snapshot from a Zookeeper db is

Re: Pathological ZK cluster: 1 server verbosely WARN'ing, other 2 servers pegging CPU

2010-05-12 Thread Ted Dunning
Impressive number here, especially at your quoted few per second rate. Are you sure that you haven't inadvertently synchronized GC on multiple machines? On Wed, May 12, 2010 at 8:30 PM, Aaron Crow dirtyvagab...@yahoo.com wrote: Right now we're at 1.9 million. This isn't a bug of our

Re: Pathological ZK cluster: 1 server verbosely WARN'ing, other 2 servers pegging CPU

2010-05-12 Thread Aaron Crow
Hi Ted, yeah it's a big number, eh? We're essentially using Zookeeper to track the state of cache entries, and currently we don't bound our cache. I didn't realize how many entries we grow to over a long period of time, until I started counting nodes in Zookeeper. But, sorry, I'm not sure what you

Re: Pathological ZK cluster: 1 server verbosely WARN'ing, other 2 servers pegging CPU

2010-05-12 Thread Patrick Hunt
On 05/12/2010 08:30 PM, Aaron Crow wrote: I may have a better idea of what caused the trouble. I way, WAY underestimated the number of nodes we collect over time. Right now we're at 1.9 million. This isn't a bug of our application; it's actually a feature (but perhaps an ill-conceived one). A

Re: Pathological ZK cluster: 1 server verbosely WARN'ing, other 2 servers pegging CPU

2010-05-12 Thread Ted Dunning
Yes. That is roughly what I mean. If one server starts a GC, it can effectively go offline. That might pressure the other servers enough that one of them starts a GC. This is unlikely with your GC settings, but you should turn on the verbose GC logging to be sure. On Wed, May 12, 2010 at

Re: Pathological ZK cluster: 1 server verbosely WARN'ing, other 2 servers pegging CPU

2010-04-30 Thread Patrick Hunt
On 04/30/2010 10:16 AM, Aaron Crow wrote: Hi Patrick, thanks for your time and detailed questions. No worries. When we hear about an issue we're very interested to followup and resolve it, regardless of the source. We take the project goals of high reliability/availablity _very_ seriously,