[Nagios-users] Retry interval on hard states

2008-03-07 Thread Tom Sommer
Hi,

I wish to setup the following check interval:

Check the service every 5 minutes
  - If down then check the service every 1 minute for 3 minutes/times
   - If still down, notify and continue to check the service every 1 
minute until it recovers.

I'm having a few problems with the last condition. Basically once the 
notification is sent, Nagios seems to revert to the normal check 
interval, which is 5 minutes - resulting in a substantial delay for the 
recovery notification to be sent.

My settings are:
max_check_attempts 3
check_interval 5
retry_interval 1

Did I miss anything or is the above simply not possible?

Using 3.0rc3

Thanks
--
Tom Sommer

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Retry interval on hard states

2008-03-07 Thread Giles Coochey
 This is expected behavior. I'm curious, what kind of environment are
you
 in when up to 5 minute delay in notification of recovery is
 'substantial'?
 

Hi Marc,

I know I'm not the target of your question, but...

Some require 5 figure uptime reports for their SLAs, and a 99.999% SLA
is often requested by users and customers.

That only gives us 315.36 seconds of downtime per year per service.

In that scenario a 5 minute delay, in order to use any of Nagios'
performance monitoring, is far too large for the margin of error.

That is an extreme case, but even 4 9s SLAs will suffer as a result.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Retry interval on hard states

2008-03-07 Thread Marc Powell


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:nagios-users-
 [EMAIL PROTECTED] On Behalf Of Giles Coochey
 Sent: Friday, March 07, 2008 9:04 AM
 To: Marc Powell; nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] Retry interval on hard states
 
  This is expected behavior. I'm curious, what kind of environment are
 you
  in when up to 5 minute delay in notification of recovery is
  'substantial'?
 
 
 Hi Marc,
 
 I know I'm not the target of your question, but...
 
 Some require 5 figure uptime reports for their SLAs, and a 99.999% SLA
 is often requested by users and customers.
 
 That only gives us 315.36 seconds of downtime per year per service.

*nod*, thanks. 

--
Marc

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Retry interval on hard states

2008-03-07 Thread Tom Sommer
Marc Powell wrote:
 Hi,

 I wish to setup the following check interval:

 Check the service every 5 minutes
   - If down then check the service every 1 minute for 3 minutes/times
- If still down, notify and continue to check the service every 1
 minute until it recovers.

 I'm having a few problems with the last condition. Basically once the
 notification is sent, Nagios seems to revert to the normal check
 interval, which is 5 minutes - resulting in a substantial delay for
 
 the
   
 recovery notification to be sent.
 

 This is expected behavior. I'm curious, what kind of environment are you
 in when up to 5 minute delay in notification of recovery is
 'substantial'?
   
Well, the current environment/system we run, have the above behavior, 
and to be honest, I don't understand how it's not default behavior.
Normally you would want to know if a service have recovered as soon as 
possible, I would have it check every 30 seconds if I could.
It's especially important for people who are on call, receive a 
notification, resolve the issue, and then await confirmation of 
recovery, 5 minutes is a long wait.

A simple setting to set this interval sounds trivial and I would think 
almost required for a monitoring system.
--
Tom Sommer

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Retry interval on hard states

2008-03-07 Thread Marcel
I guess that in 3.0rc3 you can modify service check configuration on-demand.


Not implemented yet, but you should be able to do something like changing
normal_check_interval until it reaches an OK state.

Anyone here already come up with a solution to this problem?

Cheers

On Fri, Mar 7, 2008 at 10:44 AM, Tom Sommer [EMAIL PROTECTED] wrote:

 Hi,

 I wish to setup the following check interval:

 Check the service every 5 minutes
  - If down then check the service every 1 minute for 3 minutes/times
   - If still down, notify and continue to check the service every 1
 minute until it recovers.

 I'm having a few problems with the last condition. Basically once the
 notification is sent, Nagios seems to revert to the normal check
 interval, which is 5 minutes - resulting in a substantial delay for the
 recovery notification to be sent.

 My settings are:
 max_check_attempts 3
 check_interval 5
 retry_interval 1

 Did I miss anything or is the above simply not possible?

 Using 3.0rc3

 Thanks
 --
 Tom Sommer

 -
 This SF.net email is sponsored by: Microsoft
 Defy all challenges. Microsoft(R) Visual Studio 2008.
 http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Retry interval on hard states

2008-03-07 Thread Marcel
here it is:

http://nagios.sourceforge.net/docs/3_0/adaptive.html



On Fri, Mar 7, 2008 at 3:10 PM, Marcel [EMAIL PROTECTED] wrote:

 I guess that in 3.0rc3 you can modify service check configuration
 on-demand.

 Not implemented yet, but you should be able to do something like changing
 normal_check_interval until it reaches an OK state.

 Anyone here already come up with a solution to this problem?

 Cheers


 On Fri, Mar 7, 2008 at 10:44 AM, Tom Sommer [EMAIL PROTECTED] wrote:

  Hi,
 
  I wish to setup the following check interval:
 
  Check the service every 5 minutes
   - If down then check the service every 1 minute for 3 minutes/times
- If still down, notify and continue to check the service every 1
  minute until it recovers.
 
  I'm having a few problems with the last condition. Basically once the
  notification is sent, Nagios seems to revert to the normal check
  interval, which is 5 minutes - resulting in a substantial delay for the
  recovery notification to be sent.
 
  My settings are:
  max_check_attempts 3
  check_interval 5
  retry_interval 1
 
  Did I miss anything or is the above simply not possible?
 
  Using 3.0rc3
 
  Thanks
  --
  Tom Sommer
 
 
  -
  This SF.net email is sponsored by: Microsoft
  Defy all challenges. Microsoft(R) Visual Studio 2008.
  http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
  ___
  Nagios-users mailing list
  Nagios-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/nagios-users
  ::: Please include Nagios version, plugin version (-v) and OS when
  reporting any issue.
  ::: Messages without supporting info will risk being sent to /dev/null
 


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Retry interval on hard states

2008-03-07 Thread Tom Sommer
Would this feature not be best served being in the core?

Marcel wrote:
 here it is:
 
 http://nagios.sourceforge.net/docs/3_0/adaptive.html
 
 
 
 On Fri, Mar 7, 2008 at 3:10 PM, Marcel [EMAIL PROTECTED] 
 mailto:[EMAIL PROTECTED] wrote:
 
 I guess that in 3.0rc3 you can modify service check configuration
 on-demand.
 
 Not implemented yet, but you should be able to do something like
 changing normal_check_interval until it reaches an OK state.
 
 Anyone here already come up with a solution to this problem?
 
 Cheers
 
 
 On Fri, Mar 7, 2008 at 10:44 AM, Tom Sommer [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED] wrote:
 
 Hi,
 
 I wish to setup the following check interval:
 
 Check the service every 5 minutes
  - If down then check the service every 1 minute for 3
 minutes/times
   - If still down, notify and continue to check the service every 1
 minute until it recovers.
 
 I'm having a few problems with the last condition. Basically
 once the
 notification is sent, Nagios seems to revert to the normal check
 interval, which is 5 minutes - resulting in a substantial delay
 for the
 recovery notification to be sent.
 
 My settings are:
 max_check_attempts 3
 check_interval 5
 retry_interval 1
 
 Did I miss anything or is the above simply not possible?
 
 Using 3.0rc3


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Retry interval on hard states

2008-03-07 Thread Marcel
Ive meant:

I have NOT implemented yet... - sorry for my bad english.



On Fri, Mar 7, 2008 at 3:10 PM, Marcel [EMAIL PROTECTED] wrote:

 I guess that in 3.0rc3 you can modify service check configuration
 on-demand.

 Not implemented yet, but you should be able to do something like changing
 normal_check_interval until it reaches an OK state.

 Anyone here already come up with a solution to this problem?

 Cheers


 On Fri, Mar 7, 2008 at 10:44 AM, Tom Sommer [EMAIL PROTECTED] wrote:

  Hi,
 
  I wish to setup the following check interval:
 
  Check the service every 5 minutes
   - If down then check the service every 1 minute for 3 minutes/times
- If still down, notify and continue to check the service every 1
  minute until it recovers.
 
  I'm having a few problems with the last condition. Basically once the
  notification is sent, Nagios seems to revert to the normal check
  interval, which is 5 minutes - resulting in a substantial delay for the
  recovery notification to be sent.
 
  My settings are:
  max_check_attempts 3
  check_interval 5
  retry_interval 1
 
  Did I miss anything or is the above simply not possible?
 
  Using 3.0rc3
 
  Thanks
  --
  Tom Sommer
 
 
  -
  This SF.net email is sponsored by: Microsoft
  Defy all challenges. Microsoft(R) Visual Studio 2008.
  http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
  ___
  Nagios-users mailing list
  Nagios-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/nagios-users
  ::: Please include Nagios version, plugin version (-v) and OS when
  reporting any issue.
  ::: Messages without supporting info will risk being sent to /dev/null
 


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null