OK I have Ganglia up and running. I can see tons of metrics. Quite
overwhelming...
I want to monitor my system for potential trouble. Can anyone suggest a
top ten that I should watch?
For example, "HBase: The Definitive Guide" states "The compaction queue
size is another recommended early indicator of trouble..." and "Similar
to the compaction queue you will see a sharp rise in count for the flush
queue when, for example, your servers are under I/O duress..."
Is there a list posted somewhere? What do others use?
Thanks
Peter
On 2/24/12 2:48 PM, Tom wrote:
On 02/24/2012 11:20 AM, Peter Wolf wrote:
Hello again all,
We have had a very successful time with HBase and are now ready to
deploy.
If I get this right, you were doing your first installation on Hbase
only 45 days ago? If so, that is impressive progress; congratulations
and all the best for your launch!
Our application needs to deal with millions of interactions per
day, and is hosted on Amazon. We are currently using a 3 machine cluster
for HBase.
We need to set up automatic alarms and reports, so we can see trouble
before it affects our customers. We like CloudWatch for our alarms.
We have currently set up Ganglia and started with Ken Weiner's blog
http://blog.kenweiner.com/2010/10/monitor-hbase-hadoop-with-ganglia-on.html
What other tools are available? What issues should we monitor, and how
should we monitor them? What guides should I read?
Thanks in advance
Peter