To work around the lack of host level event handler configuration, I'd converted an event handler to be run as a service check fails but to check that both the service and the parent host are both in a hard issue state, instead of checking just service environmental variables as is the norm for service event handlers. It's been firing off consistently for a while now. However, this morning I got a delayed event because when the host went into a hard problem state, the service did not.
This may be related to 3.9.0, as coincidentially I upgraded the installed version yesterday and this is the first time the event has fired off since the upgrade. Below are the logs for the first few executions, listing the env variables used to control program flow by the event handler. I'd been checking that the hoststate eq 'DOWN', hoststatetype eq 'HARD', servicestate ne 'OK', and servicestatetype eq 'HARD'. The large number of servicestate attempts never ran through completely, as it always got short circuited by the parent host failing. I'd simply picked a large number to allow the same service check definition to be used for all hosts but still allow hosts to control their delay of reporting the issue up to a couple hours. The service state checks needed to keep firing off periodically over that time to check for host hard problem states. With the service state not going to HARD this morning, however, it caused the event to not fire off until the end of this long recheck procedure. I've yanked the code checking the service state and am just operating on the host state as if the event handler were a pure host event handler. However, from rereading the documentation this may be a bug in the nagios binary. As far as I can tell, nagios should be shoving the services into a hard state as soon as parent hosts are and not waste time and resources rechecking a downed host. 2010/10/15 05:39:20 DEBUG main->main:: (53) Running /usr/local/nagios/libexec/eventhandlers/host_event_handler: ENV NAGIOS_HOSTSTATE: UP; ENV NAGIOS_HOSTSTATETYPE: HARD; ENV NAGIOS_HOSTATTEMPT: 1; ENV NAGIOS_MAXHOSTATTEMPTS: 5; ENV NAGIOS_SERVICESTATE: WARNING; ENV NAGIOS_SERVICESTATETYPE: SOFT; ENV NAGIOS_SERVICEATTEMPT: 1; ENV NAGIOS_MAXSERVICEATTEMPTS: 20 2010/10/15 05:44:50 DEBUG main->main:: (53) Running /usr/local/nagios/libexec/eventhandlers/host_event_handler: ENV NAGIOS_HOSTSTATE: DOWN; ENV NAGIOS_HOSTSTATETYPE: HARD; ENV NAGIOS_HOSTATTEMPT: 5; ENV NAGIOS_MAXHOSTATTEMPTS: 5; ENV NAGIOS_SERVICESTATE: CRITICAL; ENV NAGIOS_SERVICESTATETYPE: SOFT; ENV NAGIOS_SERVICEATTEMPT: 2; ENV NAGIOS_MAXSERVICEATTEMPTS: 20 2010/10/15 05:50:03 DEBUG main->main:: (53) Running /usr/local/nagios/libexec/eventhandlers/host_event_handler: ENV NAGIOS_HOSTSTATE: DOWN; ENV NAGIOS_HOSTSTATETYPE: HARD; ENV NAGIOS_HOSTATTEMPT: 5; ENV NAGIOS_MAXHOSTATTEMPTS: 5; ENV NAGIOS_SERVICESTATE: CRITICAL; ENV NAGIOS_SERVICESTATETYPE: SOFT; ENV NAGIOS_SERVICEATTEMPT: 3; ENV NAGIOS_MAXSERVICEATTEMPTS: 20 2010/10/15 05:55:02 DEBUG main->main:: (53) Running /usr/local/nagios/libexec/eventhandlers/host_event_handler: ENV NAGIOS_HOSTSTATE: DOWN; ENV NAGIOS_HOSTSTATETYPE: HARD; ENV NAGIOS_HOSTATTEMPT: 5; ENV NAGIOS_MAXHOSTATTEMPTS: 5; ENV NAGIOS_SERVICESTATE: CRITICAL; ENV NAGIOS_SERVICESTATETYPE: SOFT; ENV NAGIOS_SERVICEATTEMPT: 4; ENV NAGIOS_MAXSERVICEATTEMPTS: 20 -Matt * System Administrator ([email protected]) * Excel.Net,Inc. - http://www.excel.net/ <http://www.excel.net/> * (920) 452-0455 x501 - Sheboygan/Plymouth area * (888) 489-9995 x501 - Other areas, toll-free
_______________________________________________ Opsview-users mailing list [email protected] http://lists.opsview.org/lists/listinfo/opsview-users
