I know, a bit late for me on this thread, but was bored and saw this, so
here's my two cents:

I've used Sensu in the past as well, and it's pretty darn amazing. I'm not
using it anymore (we use our own monitoring stack in Amazon), but it was
great to be able to drop in Nagios scripts and have them work right away
without much work, plus auto discovery of new machines and services based
on config management.

However, I'm much more of a fan of tacking on monitoring at the tail end of
the pipeline - there's got to be a way to just publish metric data over to
a data store (Graphite comes to mind, use its API's to automatically add
new metrics), then describe what class of metrics to look at at the tail
end, and monitor *those* metrics. I don't know how hard this would be, but
you'll at least be able to alarm on percentiles and whatnot, vs. some kind
of stateless check on the machine itself (or you can try to bake in some
stateful checking in the scripts themselves, but this seems pretty damn
precarious. I'd rather do it when I know the history of a given metric, or
some kind of aggregate of all machines (vs. from the point of view of just
a single machine saying, 'oh god I'M ON FIRE SEND A PAGE NOW' when the rest
of your service is doing just fine.)

I've been out of the loop when it comes to OSS software that handles
metrics/monitors, and current practices, so maybe this is already being
done...


On Wed, Mar 27, 2013 at 9:44 AM, Josh Smift <iril...@infersys.com> wrote:

> PP> It's ideal, right up until someone changes the configuration to drop a
> PP> bunch of services and they're automatically removed from monitoring, so
> PP> you never get told.  Everything's fine with the world, as configured.
>
> AP> That would be an easy failure state to plan around, maybe as easy as
> AP> not automatically removing services from the monitor. Where the
> AP> alternative: somebody changes the configuration to add a bunch of
> AP> services and they aren't automatically monitored, would be more
> AP> difficult to detect and potentially just as disastrous.
>
> I agree -- add monitoring automatically, remove monitoring by hand, is the
> combination you really want.
>
>                                       -Josh (iril...@infersys.com)
> _______________________________________________
> Tech mailing list
> Tech@lists.lopsa.org
> https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
> This list provided by the League of Professional System Administrators
>  http://lopsa.org/
>



-- 
christian "ian" paredes
http://about.me/cparedes/bio
_______________________________________________
Tech mailing list
Tech@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to