On Dec 20, 2010, at 2:24 PM, Nick Moffitt wrote: > Mark Stanislav: >> I would recommend using Nagios event handlers for this if you want >> Nagios to essentially take the reigns of this problem. That way you >> will get your alerts and Nagios can react by starting the service >> again after x number of failures. > > Actually, this is kind of the opposite of what I want. I want a human > to have to restart the service, because otherwise it doesn't present > enough pain for the problem to be fixed more permanently. I have > situations where I semi-regularly restart a bloating service, but that's > about as heinous as I'll get. > > Once you get used to automated systems propping up your daemons, the > decay spreads until you encounter a serious intractable downtime event. > I need the relevant people to feel panic when this happens.
Fault tolerant infrastructure should be the point. Nagios will still blow up their e-mail, pager, phone, IMs until a threshold is hit and when the service restarted because of the event handler, they will get another e-mail. Why not just take a downtime (soft + hard states) report and if it breaches a given threshold a fix obviously needs to be implemented? That or the number of failures to reach a hard state should be reduced so that it's very apparent a PROBLEM beyond a dead service once a year is happening. Appears that you are trying to solve a training problem rather than an infrastructure automation problem, which is probably why Puppet & Nagios aren't an 'easy' solution to fix it with. But I digress, perhaps someone will have a Puppet answer for you nonetheless. Good luck Nick! -Mark > > -- > 01234567 <- The amazing* Indent-O-Meter! > ^ > *: Indent-O-Meter may not actually amaze. > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Users" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/puppet-users?hl=en. > -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
