monitoring zookeeper

2010-04-14 Thread Travis Crawford
Hey zookeeper gurus - Are there any recommended ways for one to monitor zookeeper ensembles? I'm familiar with the four-letter words and that stats are published via JMX - I'm more interested in what people are doing with those stats. I'd like to publish the JMX stats to Ganglia, and this works

Re: monitoring zookeeper

2010-04-14 Thread Travis Crawford
it to meet your needs and enhance it to work for Ganglia. Let me know if this helps you. thanks, Kishore G On Wed, Apr 14, 2010 at 11:12 AM, Travis Crawford traviscrawf...@gmail.comwrote: Hey zookeeper gurus - Are there any recommended ways for one to monitor zookeeper ensembles? I'm

Re: Recovery issue - how to debug?

2010-04-19 Thread Travis Crawford
On Mon, Apr 19, 2010 at 2:15 PM, Ted Dunning ted.dunn...@gmail.com wrote: Can you attach the screen shot to the JIRA issue?  The mailing list strips these things. Oops. Updated jira: https://issues.apache.org/jira/browse/ZOOKEEPER-744 --travis On Mon, Apr 19, 2010 at 1:18 PM, Travis

Re: python client structure

2010-04-21 Thread Travis Crawford
the sample code below would you want to put this on a wiki or check in as an example? Would have been nice if someone already figured this out when I started messing with things :) --travis cheers, Henry On 20 April 2010 23:33, Travis Crawford traviscrawf...@gmail.com wrote: Hey zookeeper

Misbehaving zk servers

2010-04-29 Thread Travis Crawford
Hey zookeeper gurus - We recently had a zookeeper outage when one ZK server was started with a low limit after upgrading to 3.3.0. Several days later the outage occurred when that node reached its file descriptor limit and clients started having major issues. Are there any circumstances when a

Re: Misbehaving zk servers

2010-04-29 Thread Travis Crawford
On Thu, Apr 29, 2010 at 9:49 AM, Patrick Hunt ph...@apache.org wrote: Is there any good (simple/fast/bulletproof) way to monitor the FD use inside the jvm? If so we could stop accepting new client connections once we get close to the os imposed limit... The test would have to be a bulletproof

Re: Misbehaving zk servers

2010-04-29 Thread Travis Crawford
for this? Thanks! Filed: https://issues.apache.org/jira/browse/ZOOKEEPER-759 Thanks for the feedback all! Travis Patrick On 04/29/2010 10:09 AM, Travis Crawford wrote: On Thu, Apr 29, 2010 at 9:49 AM, Patrick Huntph...@apache.org  wrote: Is there any good (simple/fast/bulletproof) way to monitor

Re: ZKClient

2010-05-04 Thread Travis Crawford
: https://issues.apache.org/jira/browse/ZOOKEEPER-765 --travis On Tue, May 4, 2010 at 3:43 PM, Travis Crawford traviscrawf...@gmail.comwrote: Attached is a skeleton application I extracted from a script I use -- perhaps we could add this as a recipe? If there are issues I'm more than happy

Zookeeper outage recap questions

2010-07-01 Thread Travis Crawford
Hey zookeepers - We just experienced a total zookeeper outage, and here's a quick post-mortem of the issue, and some questions about preventing it going forward. Quick overview of the setup: - RHEL5 2.6.18 kernel - Zookeeper 3.3.0 - ulimit raised to 65k files - 3 cluster members - 4-5k

Re: Zookeeper outage recap questions

2010-07-01 Thread Travis Crawford
to the server in this situation. Patrick On 06/30/2010 11:13 PM, Travis Crawford wrote: Hey zookeepers - We just experienced a total zookeeper outage, and here's a quick post-mortem of the issue, and some questions about preventing it going forward. Quick overview of the setup: - RHEL5

Re: zookeeper crash

2010-07-06 Thread Travis Crawford
Hey all - I believe we just suffered an outage from this issue. Short version is while restarting quorum members with GC flags recommended in the Troubleshooting wiki page a follower logged messages similar two the following jiras: 2010-07-06 23:14:01,438 - FATAL

Re: Logger hierarchies in ZK?

2010-07-20 Thread Travis Crawford
On Tue, Jul 20, 2010 at 6:07 PM, Ted Dunning ted.dunn...@gmail.com wrote: It is pretty easy to keep configuration files in general in ZK and reload them on change.  Very handy some days! We recently open-sourced tool to handle stuff like config reloads, triggering actions, etc:

Connection imbalance leads to overloaded ZK instances

2010-08-26 Thread Travis Crawford
Hey ZooKeepers - A while back we patched in the awesome ``mntr`` change, and started publishing those stats to Ganglia. They're super handy :D Looking at those graphs we see connections are not evenly distributed among cluster members, where some have thousands of connections and others are