I know, a bit late for me on this thread, but was bored and saw this, so here's my two cents:
I've used Sensu in the past as well, and it's pretty darn amazing. I'm not using it anymore (we use our own monitoring stack in Amazon), but it was great to be able to drop in Nagios scripts and have them work right away without much work, plus auto discovery of new machines and services based on config management. However, I'm much more of a fan of tacking on monitoring at the tail end of the pipeline - there's got to be a way to just publish metric data over to a data store (Graphite comes to mind, use its API's to automatically add new metrics), then describe what class of metrics to look at at the tail end, and monitor *those* metrics. I don't know how hard this would be, but you'll at least be able to alarm on percentiles and whatnot, vs. some kind of stateless check on the machine itself (or you can try to bake in some stateful checking in the scripts themselves, but this seems pretty damn precarious. I'd rather do it when I know the history of a given metric, or some kind of aggregate of all machines (vs. from the point of view of just a single machine saying, 'oh god I'M ON FIRE SEND A PAGE NOW' when the rest of your service is doing just fine.) I've been out of the loop when it comes to OSS software that handles metrics/monitors, and current practices, so maybe this is already being done... On Wed, Mar 27, 2013 at 9:44 AM, Josh Smift <iril...@infersys.com> wrote: > PP> It's ideal, right up until someone changes the configuration to drop a > PP> bunch of services and they're automatically removed from monitoring, so > PP> you never get told. Everything's fine with the world, as configured. > > AP> That would be an easy failure state to plan around, maybe as easy as > AP> not automatically removing services from the monitor. Where the > AP> alternative: somebody changes the configuration to add a bunch of > AP> services and they aren't automatically monitored, would be more > AP> difficult to detect and potentially just as disastrous. > > I agree -- add monitoring automatically, remove monitoring by hand, is the > combination you really want. > > -Josh (iril...@infersys.com) > _______________________________________________ > Tech mailing list > Tech@lists.lopsa.org > https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech > This list provided by the League of Professional System Administrators > http://lopsa.org/ > -- christian "ian" paredes http://about.me/cparedes/bio
_______________________________________________ Tech mailing list Tech@lists.lopsa.org https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/