OK I have Ganglia up and running. I can see tons of metrics. Quite overwhelming...

I want to monitor my system for potential trouble. Can anyone suggest a top ten that I should watch?

For example, "HBase: The Definitive Guide" states "The compaction queue size is another recommended early indicator of trouble..." and "Similar to the compaction queue you will see a sharp rise in count for the flush queue when, for example, your servers are under I/O duress..."

Is there a list posted somewhere?  What do others use?

Thanks
Peter


On 2/24/12 2:48 PM, Tom wrote:


On 02/24/2012 11:20 AM, Peter Wolf wrote:
Hello again all,

We have had a very successful time with HBase and are now ready to
deploy.

If I get this right, you were doing your first installation on Hbase only 45 days ago? If so, that is impressive progress; congratulations and all the best for your launch!


Our application needs to deal with millions of interactions per
day, and is hosted on Amazon. We are currently using a 3 machine cluster
for HBase.

We need to set up automatic alarms and reports, so we can see trouble
before it affects our customers. We like CloudWatch for our alarms.

We have currently set up Ganglia and started with Ken Weiner's blog

http://blog.kenweiner.com/2010/10/monitor-hbase-hadoop-with-ganglia-on.html

What other tools are available? What issues should we monitor, and how
should we monitor them? What guides should I read?

Thanks in advance
Peter





Reply via email to