On 11.11.2011 17:11, Gustavo wrote:
hello,
this is my first email
I've just installed nagios 3.2.3
And i would like to know if is there a way to configurate nagios to
send time-out cheks to other group of users.
The problem is that if some times the monitoring service works fine...
but once or twice a day it does time out exception.
if it's a timeout generated catched by alarm signal on the core, there
might be a possibility. but still this would require some code patching
in nagios itsself.
Can i config a way to if the monitoring works fine send normally the
email, if it happends a time out exception send this erro for me to
see what happends.
below [1] is a mail to nagios-devel about 2 years ago (one of those long
lasting patches for nagios never been applied), which allows to change
the default state.
if you happen to change that to unknown, you could assign special
contacts only to be notified on unknown state - beware: all other
unknowns get to them too - and therefore being notified on service check
timeouts.
a native vanilla solution might be a wrapper around the notification
script to check if $SERVICEOUTPUT$ contains Timeout or similar, and
check that against $NOTIFICATIONRECEPIENTS$.
current problem is that $NOTIFICATIONRECIPIENTS$ macro holds the wrong
information about notified contacts (instead all contacts are placed in
there). if core devs resolve #98 (patch already sent - see [0]) this
might be a possible solution to create a proxy wrapper and only pass
timeout notifications to some contacts based on the notification
receipients.
kind regards,
michael
[0] http://tracker.nagios.org/view.php?id=98
[1]
Original Message
Subject: [Nagios-devel] [PATCH] add service_check_timeout_state
configuration variable
Date: Tue, 09 Feb 2010 13:34:36 -0500
From: Bill McGonigleb...@bfccomputing.com
Reply-To: Nagios Developers Listnagios-de...@lists.sourceforge.net
Organization: BFC Computing, LLC
To: nagios-de...@lists.sourceforge.net
Hi, all,
This patch adds a variable called 'service_check_timeout_state' which
allows the admin to define the state that is returned when a service
check times out.
I look after a handful of nagios installations and the #1 complaint is
of 'false alarms', which typically result from the machine that nagios
is running on getting bogged down by some unrelated process (backups,
etc., nagios doesn't usually get its own machine in a small business)
and thus a 'critical' state is thrown, and too often everybody gets
paged in the middle of the night (we page on critical).
Nagios has had the #ifdef SERVICE_CHECK_TIMEOUTS_RETURN_UNKNOWN
available for re-compiling, which works, but then those users are unable
to keep up with their distro's updates and it may be beyond the skill of
many.
This patch moves that idea into a variable, allows any of four states to
be chosen ('critical' remaining the default), and does away with the
#ifdef (which should be obsolete now).
I've been running with my in-house nagios set to 'u', and so far no
late-nite false alarms, though I can't say it's had extensive field
testing. This is also the first time I've done any nagios hacking
(though I don't do much in c these days, the code was very easy to
follow - kudos).
Here's some suggested text for the sample config file:
---8---8---8
# SERVICE CHECK TIMEOUT STATE
# This setting determines the state Nagios will report when a
# service check times out - that is does not respond within
# service_check_timeout seconds. This can be useful if a
# machine is running at too high a load and you do not want
# to consider a failed service check to be critical (the default).
# Valid settings are:
# c - Critical (default)
# u - Unknown
# w - Warning
# o - OK
service_check_timeout_state=c
---8---8---8
and the patch follows (the format I have in my rpm file, not sure how to
use git yet).
Thanks,
-Bill
---8---8---8
diff -ur nagios-3.2.0/base/config.c nagios-3.2.0-bfc/base/config.c
--- nagios-3.2.0/base/config.c2009-05-17 08:54:28.0 -0400
+++ nagios-3.2.0-bfc/base/config.c2010-02-08 18:47:21.0 -0500
@@ -73,6 +73,7 @@
extern int log_passive_checks;
extern int service_check_timeout;
+extern int service_check_timeout_state;
extern int host_check_timeout;
extern int event_handler_timeout;
extern int notification_timeout;
@@ -722,6 +723,23 @@
break;
}
}
+
+else if(!strcmp(variable,service_check_timeout_state)){
+
+if(!strcmp(value,o))
+service_check_timeout_state=STATE_OK;
+else if(!strcmp(value,w))
+service_check_timeout_state=STATE_WARNING;
+else if(!strcmp(value,c))
+