Yes. That is roughly what I mean.
If one server starts a GC, it can effectively go offline. That might
pressure the other servers enough that one of them starts a GC.
This is unlikely with your GC settings, but you should turn on the verbose
GC logging to be sure.
On Wed, May 12, 2010 at 10:09
On 05/12/2010 08:30 PM, Aaron Crow wrote:
I may have a better idea of what caused the trouble. I way, WAY
underestimated the number of nodes we collect over time. Right now we're at
1.9 million. This isn't a bug of our application; it's actually a feature
(but perhaps an ill-conceived one).
A m
Hi Ted, yeah it's a big number, eh? We're essentially using Zookeeper to
track the state of cache entries, and currently we don't bound our cache. I
didn't realize how many entries we grow to over a long period of time, until
I started counting nodes in Zookeeper. But, sorry, I'm not sure what you
Impressive number here, especially at your quoted "few per second" rate.
Are you sure that you haven't inadvertently synchronized GC on multiple
machines?
On Wed, May 12, 2010 at 8:30 PM, Aaron Crow wrote:
> Right now we're at
> 1.9 million. This isn't a bug of our application; it's actually a
I may have a better idea of what caused the trouble. I way, WAY
underestimated the number of nodes we collect over time. Right now we're at
1.9 million. This isn't a bug of our application; it's actually a feature
(but perhaps an ill-conceived one).
A most recent snapshot from a Zookeeper db is 22
On 04/30/2010 10:16 AM, Aaron Crow wrote:
Hi Patrick, thanks for your time and detailed questions.
No worries. When we hear about an issue we're very interested to
followup and resolve it, regardless of the source. We take the project
goals of high reliability/availablity _very_ seriously,
Hi Patrick, thanks for your time and detailed questions.
We're running on Java build 1.6.0_14-b08, on Ubuntu 4.2.4-1ubuntu3. Below is
output from a recent stat, and a question about node count. For your other
questions, I should save your time with a batch reply: I wasn't tracking
nearly enough th
Btw, are you monitoring the ZK server jvms? Please take a look at
http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_zkCommands
It would be interesting if you could run commmands such as "stat"
against your currently running cluster. In particular I'd be interested
to know
Hi Aaron, some questions/comments below:
On 04/28/2010 06:29 PM, Aaron Crow wrote:
We were running version 3.2.2 for about a month and it was working well for
us. Then late this past Saturday night, our cluster went pathological. One
of the 3 ZK servers spewed many WARNs (see below), and the oth