Re: Which monitoring metrics to alert on?

2018-04-08 Thread Mark Bonetti
Hubbert, no worries, thanks for the effort regardless. Sudhir, thanks for that. Yes, each server will have a monitoring agent (that sends back metrics) installed. On Sun, Apr 8, 2018 at 3:15 AM, sudhir patil wrote: > Few important thingsto monitor from top of head > > Compaction queue size, com

Re: Which monitoring metrics to alert on?

2018-04-07 Thread sudhir patil
Few important thingsto monitor from top of head Compaction queue size, compaction size ( size of all files in compaction) GC pause time, number gc (highly co rellated to compactions) Ipc read write call size Slow query logs Number of failed regions from canary tests Replication queue size Its bet

Re: Which monitoring metrics to alert on?

2018-04-06 Thread Hubbert Smith
OK, guilty as charged. my imagination got away from me you just wanted to monitor your hbase, not your hardware ... ok then On Fri, Apr 6, 2018 at 4:13 AM, Mark Bonetti wrote: > Hi, > I'm building a monitoring system for HBase and want to set up default > alerts (threshold or anomaly) on 2-3 key

Re: Which monitoring metrics to alert on?

2018-04-06 Thread Hubbert Smith
suggesting storage-related metrics - storage device failure is sort of a big deal storage is where the valuable data sits, and device failures impacts everything suggest your goal be - identify which SSDs, HDDs and Servers are reliable, and which are unreliable. there are tools - https://en.wikipe

Which monitoring metrics to alert on?

2018-04-06 Thread Mark Bonetti
Hi, I'm building a monitoring system for HBase and want to set up default alerts (threshold or anomaly) on 2-3 key metrics everyone who uses HBase typically wants to alert on, but I don't yet have production-grade experience with HBase. Importantly, alert rules have to be generally useful, so can'