On Dec 20, 2010, at 2:24 PM, Nick Moffitt wrote:

> Mark Stanislav:
>> I would recommend using Nagios event handlers for this if you want
>> Nagios to essentially take the reigns of this problem. That way you
>> will get your alerts and Nagios can react by starting the service
>> again after x number of failures.
> 
> Actually, this is kind of the opposite of what I want.  I want a human
> to have to restart the service, because otherwise it doesn't present
> enough pain for the problem to be fixed more permanently.  I have
> situations where I semi-regularly restart a bloating service, but that's
> about as heinous as I'll get.
> 
> Once you get used to automated systems propping up your daemons, the
> decay spreads until you encounter a serious intractable downtime event.
> I need the relevant people to feel panic when this happens.

Fault tolerant infrastructure should be the point. Nagios will still blow up 
their e-mail, pager, phone, IMs until a threshold is hit and when the service 
restarted because of the event handler, they will get another e-mail. Why not 
just take a downtime (soft + hard states) report and if it breaches a given 
threshold a fix obviously needs to be implemented? That or the number of 
failures to reach a hard state should be reduced so that it's very apparent a 
PROBLEM beyond a dead service once a year is happening.

Appears that you are trying to solve a training problem rather than an 
infrastructure automation problem, which is probably why Puppet & Nagios aren't 
an 'easy' solution to fix it with.

But I digress, perhaps someone will have a Puppet answer for you nonetheless. 
Good luck Nick!

-Mark

> 
> -- 
>       01234567 <- The amazing* Indent-O-Meter!
>        ^
> *: Indent-O-Meter may not actually amaze.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Puppet Users" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/puppet-users?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.

Reply via email to