Few important thingsto monitor from top of head
Compaction queue size, compaction size ( size of all files in compaction)
GC pause time, number gc (highly co rellated to compactions)
Ipc read write call size
Slow query logs
Number of failed regions from canary tests
Replication queue size
Its better to monitor these metrics at each region server level to detect
issues e.g overall cluster gc may be around average but all of the gc’s
could be happening in only one region server, its very difficult to find
these unless you track these metrics at each region server level.
On Fri, 6 Apr 2018 at 11:27 PM, Hubbert Smith <hubb...@hubbertsmith.com>
> OK, guilty as charged. my imagination got away from me
> you just wanted to monitor your hbase, not your hardware ... ok then
> On Fri, Apr 6, 2018 at 4:13 AM, Mark Bonetti <mark.bonetti.sc...@gmail.com
> > Hi,
> > I'm building a monitoring system for HBase and want to set up default
> > alerts (threshold or anomaly) on 2-3 key metrics everyone who uses HBase
> > typically wants to alert on, but I don't yet have production-grade
> > experience with HBase.
> > Importantly, alert rules have to be generally useful, so can't be on
> > metrics whose values vary wildly based on the size of deployment.
> > In other words, which metrics would be most significant indicators that
> > something went wrong with your HBase?
> > I thought the best place to find experienced HBase users, who would find
> > answering this question trivial, would be here.
> > Thanks very much,
> > Mark
> hubb...@hubbertsmith.com | 385 321 0757 | LinkedIN
> Linkedin Learning: Storage Foundations Cert Prep: SNCP Foundations S10-110