Recommendations for zookeeper deployment

2010-01-12 Thread Mekaraj, Prashant
Hi, http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html is a great resource. It's rare to see a open source project think so much about practical enterprise deployment and this is much appreciated. There are a few more recommendations that I think would be useful to add to the

Re: Recommendations for zookeeper deployment

2010-01-12 Thread Patrick Hunt
Mekaraj, Prashant wrote: Hi, http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html is a great resource. It's rare to see a open source project think so much about practical enterprise deployment and this is much appreciated. Thanks! There are a few more recommendations that

Killing a zookeeper server

2010-01-12 Thread Nick Bailey
We are running zookeeper 3.1.0 Recently we noticed the cpu usage on our machines becoming increasingly high and we believe the cause is https://issues.apache.org/jira/browse/ZOOKEEPER-427 However our solution when we noticed the problem was to kill the zookeeper process and restart it.

Re: Killing a zookeeper server

2010-01-12 Thread Patrick Hunt
12 servers? That's alot, if you dont' mind my asking why so many? Typically we recommend 5 - that way you can have one down for maintenance and still have a failure that doesn't bring down the cluster. The electing a leader is probably the restarted machine attempting to re-join the ensemble

Re: Killing a zookeeper server

2010-01-12 Thread Nick Bailey
12 was just to keep uniformity on our servers. Our clients are connecting from the same 12 servers. Easily modifiable and perhaps we should look into changing that. The logs just seem to indicate that the servers that claim to have no server running are continually attempting to elect a

Re: Killing a zookeeper server

2010-01-12 Thread Adam Rosien
I have a related question: what's the behavior of a cluster of 3 when one is down? I've tried it and a leader is elected, but are there any other caveats for this situation? .. Adam On Tue, Jan 12, 2010 at 2:40 PM, Patrick Hunt ph...@apache.org wrote: 12 servers? That's alot, if you dont' mind

Re: Killing a zookeeper server

2010-01-12 Thread Henry Robinson
Hi Adam - As long as a quorum of servers is running, ZK will be live. With majority quorums, 2/3 is enough to keep going. In general, if fewer than half your nodes have failed, ZK will keep on keeping on. The main concern with a cluster of 2/3 machines is that a single further failure will bring

Re: Killing a zookeeper server

2010-01-12 Thread Nick Bailey
In my last email I failded to include a log line that may be revelent as well 2010-01-12 18:33:10,658 [QuorumPeer:/0.0.0.0:2181] (QuorumCnxManager) DEBUG - Queue size: 0 2010-01-12 18:33:10,659 [QuorumPeer:/0.0.0.0:2181] (FastLeaderElection) INFO - Notification time out: 6400 We see this line

Re: Killing a zookeeper server

2010-01-12 Thread Adam Rosien
Doh - that makes total sense. For whatever reason I thought with 2 servers you couldn't get a majority :P On Tue, Jan 12, 2010 at 3:17 PM, Henry Robinson he...@cloudera.com wrote: Hi Adam - As long as a quorum of servers is running, ZK will be live. With majority quorums, 2/3 is enough to

Re: Killing a zookeeper server

2010-01-12 Thread Patrick Hunt
Nick Bailey wrote: In my last email I failded to include a log line that may be revelent as well 2010-01-12 18:33:10,658 [QuorumPeer:/0.0.0.0:2181] (QuorumCnxManager) DEBUG - Queue size: 0 2010-01-12 18:33:10,659 [QuorumPeer:/0.0.0.0:2181] (FastLeaderElection) INFO - Notification time out:

Re: Why is not win32 usable in production?

2010-01-12 Thread Jiro Iwamoto
Thanks Patrik. I can likely use zookeeper. I try to use zookeeper in win32. thanks a lot. On Tue, Jan 12, 2010 at 2:35 AM, Patrick Hunt ph...@apache.org wrote: There are 3 principal components to zookeeper: java server and client, c client. The c client is used in the perl/python bindings