[Nagios-users] nagios timeout checks

2011-11-11 Thread Gustavo
hello,
this is my first email
I've just installed nagios 3.2.3
And i would like to know if is there a way to configurate nagios to send
time-out cheks to other group of users.

The problem is that if some times the monitoring service works fine... but
once or twice a day it does time out exception.

Can i config a way to if the monitoring works fine send normally the email,
if it happends a time out exception send this erro for me to see what
happends.



thakou in advanced and sorry for my bad english,  hope you undarstand   :)

-- 
-- 
Atenciosamente,

Gustavo de Araujo Lima Machado
Telefone: +55 +21 ***
Celular: +55+21 84588122
E-Mail: g.macha...@gmail.com.br
 @gmachado1

Residência
:
--
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] nagios timeout checks

2011-11-11 Thread Jim Avery
On 11 November 2011 16:11, Gustavo g.macha...@gmail.com wrote:

 hello,
 this is my first email
 I've just installed nagios 3.2.3
 And i would like to know if is there a way to configurate nagios to send 
 time-out cheks to other group of users.
 The problem is that if some times the monitoring service works fine... but 
 once or twice a day it does time out exception.
 Can i config a way to if the monitoring works fine send normally the email, 
 if it happends a time out exception send this erro for me to see what 
 happends.


 thakou in advanced and sorry for my bad english,  hope you undarstand   :)


If the timeout errors result in an UNKNOWN state, then you can
configure your ordinary users (for example) to receive only WARNING,
CRITICAL and RECOVERY notifications, but your Nagios administrator
users only to receive UNKNOWN and FLAPPING notifications.  That is set
for all notifications for that contact though.

For example:

define contact {
  contact_name  luciano_lanza
  aliasordinary user Luciano Lanza
  host_notification_options  d,r
  service_notification_options  w,c,r
  email  lla...@hotmael.com
  }

define contact {
  contact_name  gustavo_machado
  alias Nagios Admin G. Machado
  host_notification_options  u,f
  service_notification_options  u,f
  email  gmach...@hotmael.com
  }


Note that the check_nt plugin will only result in UNKNOWN on timeout
if you specify the -u or --unknown-timeout option.

Unfortunately some other plugins do set status to UNKNOWN on timeout
but others don't.

With some checks, it helps to set max_check_attempts to 3 or more so
that if the plugin times out only once or twice then Nagios will not
send a notification.


Cheers, and welcome!

Jim

--
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] nagios timeout checks

2011-11-11 Thread Michael Friedrich
On 11.11.2011 17:11, Gustavo wrote:
 hello,
 this is my first email
 I've just installed nagios 3.2.3
 And i would like to know if is there a way to configurate nagios to 
 send time-out cheks to other group of users.

 The problem is that if some times the monitoring service works fine... 
 but once or twice a day it does time out exception.

if it's a timeout generated catched by alarm signal on the core, there 
might be a possibility. but still this would require some code patching 
in nagios itsself.


 Can i config a way to if the monitoring works fine send normally the 
 email, if it happends a time out exception send this erro for me to 
 see what happends.

below [1] is a mail to nagios-devel about 2 years ago (one of those long 
lasting patches for nagios never been applied), which allows to change 
the default state.

if you happen to change that to unknown, you could assign special 
contacts only to be notified on unknown state - beware: all other 
unknowns get to them too - and therefore being notified on service check 
timeouts.

a native vanilla solution might be a wrapper around the notification 
script to check if $SERVICEOUTPUT$ contains Timeout or similar, and 
check that against $NOTIFICATIONRECEPIENTS$.
current problem is that $NOTIFICATIONRECIPIENTS$ macro holds the wrong 
information about notified contacts (instead all contacts are placed in 
there). if core devs resolve #98 (patch already sent - see [0]) this 
might be a possible solution to create a proxy wrapper and only pass 
timeout notifications to some contacts based on the notification 
receipients.

kind regards,
michael

[0] http://tracker.nagios.org/view.php?id=98
[1]

   Original Message 
  Subject: [Nagios-devel] [PATCH] add service_check_timeout_state
  configuration variable
  Date: Tue, 09 Feb 2010 13:34:36 -0500
  From: Bill McGonigleb...@bfccomputing.com
  Reply-To: Nagios Developers Listnagios-de...@lists.sourceforge.net
  Organization: BFC Computing, LLC
  To: nagios-de...@lists.sourceforge.net

  Hi, all,

  This patch adds a variable called 'service_check_timeout_state' which
  allows the admin to define the state that is returned when a service
  check times out.

  I look after a handful of nagios installations and the #1 complaint is
  of 'false alarms', which typically result from the machine that nagios
  is running on getting bogged down by some unrelated process (backups,
  etc., nagios doesn't usually get its own machine in a small business)
  and thus a 'critical' state is thrown, and too often everybody gets
  paged in the middle of the night (we page on critical).

  Nagios has had the #ifdef SERVICE_CHECK_TIMEOUTS_RETURN_UNKNOWN
  available for re-compiling, which works, but then those users are unable
  to keep up with their distro's updates and it may be beyond the skill of
  many.

  This patch moves that idea into a variable, allows any of four states to
  be chosen ('critical' remaining the default), and does away with the
  #ifdef (which should be obsolete now).

  I've been running with my in-house nagios set to 'u', and so far no
  late-nite false alarms, though I can't say it's had extensive field
  testing.  This is also the first time I've done any nagios hacking
  (though I don't do much in c these days, the code was very easy to
  follow - kudos).

  Here's some suggested text for the sample config file:

  ---8---8---8

  # SERVICE CHECK TIMEOUT STATE
  # This setting determines the state Nagios will report when a
  # service check times out - that is does not respond within
  # service_check_timeout seconds.  This can be useful if a
  # machine is running at too high a load and you do not want
  # to consider a failed service check to be critical (the default).
  # Valid settings are:
  # c - Critical (default)
  # u - Unknown
  # w - Warning
  # o - OK

  service_check_timeout_state=c

  ---8---8---8

  and the patch follows (the format I have in my rpm file, not sure how to
  use git yet).

  Thanks,
  -Bill


  ---8---8---8

  diff -ur nagios-3.2.0/base/config.c nagios-3.2.0-bfc/base/config.c
  --- nagios-3.2.0/base/config.c2009-05-17 08:54:28.0 -0400
  +++ nagios-3.2.0-bfc/base/config.c2010-02-08 18:47:21.0 -0500
  @@ -73,6 +73,7 @@
   extern int  log_passive_checks;

   extern int  service_check_timeout;
  +extern int  service_check_timeout_state;
   extern int  host_check_timeout;
   extern int  event_handler_timeout;
   extern int  notification_timeout;
  @@ -722,6 +723,23 @@
   break;
   }
   }
  +
  +else if(!strcmp(variable,service_check_timeout_state)){
  +
  +if(!strcmp(value,o))
  +service_check_timeout_state=STATE_OK;
  +else if(!strcmp(value,w))
  +service_check_timeout_state=STATE_WARNING;
  +else if(!strcmp(value,c))
  +