[Nagios-users] Retry interval on hard states
Hi, I wish to setup the following check interval: Check the service every 5 minutes - If down then check the service every 1 minute for 3 minutes/times - If still down, notify and continue to check the service every 1 minute until it recovers. I'm having a few problems with the last condition. Basically once the notification is sent, Nagios seems to revert to the normal check interval, which is 5 minutes - resulting in a substantial delay for the recovery notification to be sent. My settings are: max_check_attempts 3 check_interval 5 retry_interval 1 Did I miss anything or is the above simply not possible? Using 3.0rc3 Thanks -- Tom Sommer - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Retry interval on hard states
This is expected behavior. I'm curious, what kind of environment are you in when up to 5 minute delay in notification of recovery is 'substantial'? Hi Marc, I know I'm not the target of your question, but... Some require 5 figure uptime reports for their SLAs, and a 99.999% SLA is often requested by users and customers. That only gives us 315.36 seconds of downtime per year per service. In that scenario a 5 minute delay, in order to use any of Nagios' performance monitoring, is far too large for the margin of error. That is an extreme case, but even 4 9s SLAs will suffer as a result. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Retry interval on hard states
-Original Message- From: [EMAIL PROTECTED] [mailto:nagios-users- [EMAIL PROTECTED] On Behalf Of Giles Coochey Sent: Friday, March 07, 2008 9:04 AM To: Marc Powell; nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Retry interval on hard states This is expected behavior. I'm curious, what kind of environment are you in when up to 5 minute delay in notification of recovery is 'substantial'? Hi Marc, I know I'm not the target of your question, but... Some require 5 figure uptime reports for their SLAs, and a 99.999% SLA is often requested by users and customers. That only gives us 315.36 seconds of downtime per year per service. *nod*, thanks. -- Marc - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Retry interval on hard states
Marc Powell wrote: Hi, I wish to setup the following check interval: Check the service every 5 minutes - If down then check the service every 1 minute for 3 minutes/times - If still down, notify and continue to check the service every 1 minute until it recovers. I'm having a few problems with the last condition. Basically once the notification is sent, Nagios seems to revert to the normal check interval, which is 5 minutes - resulting in a substantial delay for the recovery notification to be sent. This is expected behavior. I'm curious, what kind of environment are you in when up to 5 minute delay in notification of recovery is 'substantial'? Well, the current environment/system we run, have the above behavior, and to be honest, I don't understand how it's not default behavior. Normally you would want to know if a service have recovered as soon as possible, I would have it check every 30 seconds if I could. It's especially important for people who are on call, receive a notification, resolve the issue, and then await confirmation of recovery, 5 minutes is a long wait. A simple setting to set this interval sounds trivial and I would think almost required for a monitoring system. -- Tom Sommer - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Retry interval on hard states
I guess that in 3.0rc3 you can modify service check configuration on-demand. Not implemented yet, but you should be able to do something like changing normal_check_interval until it reaches an OK state. Anyone here already come up with a solution to this problem? Cheers On Fri, Mar 7, 2008 at 10:44 AM, Tom Sommer [EMAIL PROTECTED] wrote: Hi, I wish to setup the following check interval: Check the service every 5 minutes - If down then check the service every 1 minute for 3 minutes/times - If still down, notify and continue to check the service every 1 minute until it recovers. I'm having a few problems with the last condition. Basically once the notification is sent, Nagios seems to revert to the normal check interval, which is 5 minutes - resulting in a substantial delay for the recovery notification to be sent. My settings are: max_check_attempts 3 check_interval 5 retry_interval 1 Did I miss anything or is the above simply not possible? Using 3.0rc3 Thanks -- Tom Sommer - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Retry interval on hard states
here it is: http://nagios.sourceforge.net/docs/3_0/adaptive.html On Fri, Mar 7, 2008 at 3:10 PM, Marcel [EMAIL PROTECTED] wrote: I guess that in 3.0rc3 you can modify service check configuration on-demand. Not implemented yet, but you should be able to do something like changing normal_check_interval until it reaches an OK state. Anyone here already come up with a solution to this problem? Cheers On Fri, Mar 7, 2008 at 10:44 AM, Tom Sommer [EMAIL PROTECTED] wrote: Hi, I wish to setup the following check interval: Check the service every 5 minutes - If down then check the service every 1 minute for 3 minutes/times - If still down, notify and continue to check the service every 1 minute until it recovers. I'm having a few problems with the last condition. Basically once the notification is sent, Nagios seems to revert to the normal check interval, which is 5 minutes - resulting in a substantial delay for the recovery notification to be sent. My settings are: max_check_attempts 3 check_interval 5 retry_interval 1 Did I miss anything or is the above simply not possible? Using 3.0rc3 Thanks -- Tom Sommer - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Retry interval on hard states
Would this feature not be best served being in the core? Marcel wrote: here it is: http://nagios.sourceforge.net/docs/3_0/adaptive.html On Fri, Mar 7, 2008 at 3:10 PM, Marcel [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: I guess that in 3.0rc3 you can modify service check configuration on-demand. Not implemented yet, but you should be able to do something like changing normal_check_interval until it reaches an OK state. Anyone here already come up with a solution to this problem? Cheers On Fri, Mar 7, 2008 at 10:44 AM, Tom Sommer [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Hi, I wish to setup the following check interval: Check the service every 5 minutes - If down then check the service every 1 minute for 3 minutes/times - If still down, notify and continue to check the service every 1 minute until it recovers. I'm having a few problems with the last condition. Basically once the notification is sent, Nagios seems to revert to the normal check interval, which is 5 minutes - resulting in a substantial delay for the recovery notification to be sent. My settings are: max_check_attempts 3 check_interval 5 retry_interval 1 Did I miss anything or is the above simply not possible? Using 3.0rc3 - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Retry interval on hard states
Ive meant: I have NOT implemented yet... - sorry for my bad english. On Fri, Mar 7, 2008 at 3:10 PM, Marcel [EMAIL PROTECTED] wrote: I guess that in 3.0rc3 you can modify service check configuration on-demand. Not implemented yet, but you should be able to do something like changing normal_check_interval until it reaches an OK state. Anyone here already come up with a solution to this problem? Cheers On Fri, Mar 7, 2008 at 10:44 AM, Tom Sommer [EMAIL PROTECTED] wrote: Hi, I wish to setup the following check interval: Check the service every 5 minutes - If down then check the service every 1 minute for 3 minutes/times - If still down, notify and continue to check the service every 1 minute until it recovers. I'm having a few problems with the last condition. Basically once the notification is sent, Nagios seems to revert to the normal check interval, which is 5 minutes - resulting in a substantial delay for the recovery notification to be sent. My settings are: max_check_attempts 3 check_interval 5 retry_interval 1 Did I miss anything or is the above simply not possible? Using 3.0rc3 Thanks -- Tom Sommer - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null