Recommendations for zookeeper deployment

2010-01-12 Thread Mekaraj, Prashant
Hi, http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html is a great resource. It's rare to see a open source project think so much about practical enterprise deployment and this is much appreciated. There are a few more recommendations that I think would be useful to add to the

Re: Recommendations for zookeeper deployment

2010-01-12 Thread Patrick Hunt
Mekaraj, Prashant wrote: Hi, http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html is a great resource. It's rare to see a open source project think so much about practical enterprise deployment and this is much appreciated. Thanks! There are a few more recommendations that

Killing a zookeeper server

2010-01-12 Thread Nick Bailey
We are running zookeeper 3.1.0 Recently we noticed the cpu usage on our machines becoming increasingly high and we believe the cause is https://issues.apache.org/jira/browse/ZOOKEEPER-427 However our solution when we noticed the problem was to kill the zookeeper process and restart it.

Re: Killing a zookeeper server

2010-01-12 Thread Patrick Hunt
12 servers? That's alot, if you dont' mind my asking why so many? Typically we recommend 5 - that way you can have one down for maintenance and still have a failure that doesn't bring down the cluster. The electing a leader is probably the restarted machine attempting to re-join the ensemble

Re: Killing a zookeeper server

2010-01-12 Thread Adam Rosien
I have a related question: what's the behavior of a cluster of 3 when one is down? I've tried it and a leader is elected, but are there any other caveats for this situation? .. Adam On Tue, Jan 12, 2010 at 2:40 PM, Patrick Hunt ph...@apache.org wrote: 12 servers? That's alot, if you dont' mind

Re: Killing a zookeeper server

2010-01-12 Thread Henry Robinson
Hi Adam - As long as a quorum of servers is running, ZK will be live. With majority quorums, 2/3 is enough to keep going. In general, if fewer than half your nodes have failed, ZK will keep on keeping on. The main concern with a cluster of 2/3 machines is that a single further failure will bring

Re: Killing a zookeeper server

2010-01-12 Thread Nick Bailey
In my last email I failded to include a log line that may be revelent as well 2010-01-12 18:33:10,658 [QuorumPeer:/0.0.0.0:2181] (QuorumCnxManager) DEBUG - Queue size: 0 2010-01-12 18:33:10,659 [QuorumPeer:/0.0.0.0:2181] (FastLeaderElection) INFO - Notification time out: 6400 We see this line

Re: Killing a zookeeper server

2010-01-12 Thread Adam Rosien
Doh - that makes total sense. For whatever reason I thought with 2 servers you couldn't get a majority :P On Tue, Jan 12, 2010 at 3:17 PM, Henry Robinson he...@cloudera.com wrote: Hi Adam - As long as a quorum of servers is running, ZK will be live. With majority quorums, 2/3 is enough to

Re: Killing a zookeeper server

2010-01-12 Thread Patrick Hunt
Nick Bailey wrote: In my last email I failded to include a log line that may be revelent as well 2010-01-12 18:33:10,658 [QuorumPeer:/0.0.0.0:2181] (QuorumCnxManager) DEBUG - Queue size: 0 2010-01-12 18:33:10,659 [QuorumPeer:/0.0.0.0:2181] (FastLeaderElection) INFO - Notification time out:

Re: Why is not win32 usable in production?

2010-01-12 Thread Jiro Iwamoto
Thanks Patrik. I can likely use zookeeper. I try to use zookeeper in win32. thanks a lot. On Tue, Jan 12, 2010 at 2:35 AM, Patrick Hunt ph...@apache.org wrote: There are 3 principal components to zookeeper: java server and client, c client. The c client is used in the perl/python bindings