My TSDB dashboard (that pulls in those metrics) has: - request rate - GC - compactions queue - IO wait - User CPU
J-D On Fri, Feb 24, 2012 at 2:16 PM, Peter Wolf <[email protected]> wrote: > OK I have Ganglia up and running. I can see tons of metrics. Quite > overwhelming... > > I want to monitor my system for potential trouble. Can anyone suggest a top > ten that I should watch? > > For example, "HBase: The Definitive Guide" states "The compaction queue size > is another recommended early indicator of trouble..." and "Similar to the > compaction queue you will see a sharp rise in count for the flush queue > when, for example, your servers are under I/O duress..." > > Is there a list posted somewhere? What do others use? > > Thanks > Peter > > > On 2/24/12 2:48 PM, Tom wrote: >> >> >> >> On 02/24/2012 11:20 AM, Peter Wolf wrote: >>> >>> Hello again all, >>> >>> We have had a very successful time with HBase and are now ready to >>> deploy. >> >> >> If I get this right, you were doing your first installation on Hbase only >> 45 days ago? If so, that is impressive progress; congratulations and all the >> best for your launch! >> >> >> Our application needs to deal with millions of interactions per >>> >>> day, and is hosted on Amazon. We are currently using a 3 machine cluster >>> for HBase. >>> >>> We need to set up automatic alarms and reports, so we can see trouble >>> before it affects our customers. We like CloudWatch for our alarms. >>> >>> We have currently set up Ganglia and started with Ken Weiner's blog >>> >>> >>> http://blog.kenweiner.com/2010/10/monitor-hbase-hadoop-with-ganglia-on.html >>> >>> What other tools are available? What issues should we monitor, and how >>> should we monitor them? What guides should I read? >>> >>> Thanks in advance >>> Peter >>> >>> >>> >> >
