subject:"\[Nagios\-users\] Nagios v3.5.0 transitioning immediately to a HARD state upon host problem"

Re: [Nagios-users] Nagios v3.5.0 transitioning immediately to a HARD state upon host problem

2013-05-25 Thread C. Bensend


> diff -uNp nagios-updated.cfg nagios.cfg
> --- nagios-updated.cfg  Sat May 25 09:05:09 2013
> +++ nagios.cfg  Sat May 25 09:02:37 2013
> @@ -981,9 +981,9 @@ translate_passive_host_checks=0
>
>  # PASSIVE HOST CHECKS ARE SOFT OPTION
>  # This determines whether or not Nagios will treat passive host
> -# checks as being HARD or SOFT.  By default, a single passive host
> -# check result will put a host into an immediate HARD state type.
> -# This can be changed by enabling this option.
> +# checks as being HARD or SOFT.  By default, a passive host check
> +# result will put a host into a HARD state type.  This can be changed
> +# by enabling this option.
>  # Values: 0 = passive checks are HARD, 1 = passive checks are SOFT
>
>  passive_host_checks_are_soft=0
>
>
> Does that make sense?  If I had read something like that, it would
> have been immediately clear to me what was happening.
>
> Thank you so much, Andreas!  On to the next problem with the
> upgrade (something that can wait until next week)...

Sorry, too little caffeine too early, got the files reversed.  Here's
the right diff:

diff -uNp nagios.cfg nagios-updated.cfg
--- nagios.cfg  Sat May 25 10:25:34 2013
+++ nagios-updated.cfg  Sat May 25 10:27:12 2013
@@ -981,9 +981,9 @@ translate_passive_host_checks=0

 # PASSIVE HOST CHECKS ARE SOFT OPTION
 # This determines whether or not Nagios will treat passive host
-# checks as being HARD or SOFT.  By default, a passive host check
-# result will put a host into a HARD state type.  This can be changed
-# by enabling this option.
+# checks as being HARD or SOFT.  By default, a single passive host
+# check result will put a host into an immediate HARD state type.
+# This can be changed by enabling this option.
 # Values: 0 = passive checks are HARD, 1 = passive checks are SOFT

 passive_host_checks_are_soft=0



-- 
"The very existence of flamethrowers proves that sometime, somewhere,
someone said to themselves, 'You know, I want to set those people
over there on fire, but I'm just not close enough to get the job
done.'"  -- George Carlin


--
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Nagios v3.5.0 transitioning immediately to a HARD state upon host problem

2013-05-25 Thread C. Bensend


> On 2013-05-23 17:43, C. Bensend wrote:
>>
>> Hey folks,
>>
>> I recently made two major changes to my Nagios environment:
>>
>> 1) I upgraded to v3.5.0.
>> 2) I moved from a single server to two pollers sending passive
>> results to one central console server.
>>
>> Now, this new distributed system was in place for several months
>> while I tested, and it worked fine.  HOWEVER, since this was running
>> in parallel with my production system, notifications were disabled.
>> Hence, I didn't see this problem until I cut over for real and
>> enabled notifications.
>>
>> (please excuse any cut-n-paste ugliness, had to send this info from
>> my work account via Outlook and then try to cleanse and reformat
>> via Squirrelmail)
>>
>> As a test and to capture information, I reboot 'hostname'.  This
>> log is from the nagios-console host, which is the host that accepts
>> the passive check results and sends notifications.  Here is the
>> console host receiving a service check failure when the host is
>> restarting:
>>
>> May 22 15:57:10 nagios-console nagios: SERVICE ALERT: hostname;/var disk
>> queue;CRITICAL;SOFT;1;Connection refused by host
>>
>>
>> So, the distributed poller system checks the host and sends its
>> results to the console server:
>>
>> May 22 15:57:30 nagios-console nagios: HOST
>> ALERT:hostname;DOWN;SOFT;1;CRITICAL - Host Unreachable (a.b.c.d)
>>
>>
>> And then the centralized server IMMEDIATELY goes into a hard state,
>> which triggers a  notification:
>>
>> May 22 15:57:30 nagios-console nagios: HOST ALERT:
>> hostname;DOWN;HARD;1;CRITICAL - Host Unreachable (a.b.c.d)
>> May 22 15:57:30 nagios-console nagios: HOST NOTIFICATION:
>> cbensend;hostname;DOWN;host-notify-by-email-test;CRITICAL -
>> Host Unreachable (a.b.c.d)
>>
>>
>> Um.  Wat?  Why would the console immediately trigger a hard
>> state? The config files don't support this decision.  And this
>> IS a problem with the console server - the distributed monitors
>> continue checking the host for 6 times like they should.  But
>> for some reason, the centralized console just immediately
>> calls it a hard state.

*snip*

>
>
> Set passive_host_checks_are_soft=1 in nagios.cfg on your master
> server and things should start working as intended.
>
> --
> Andreas Ericsson   andreas.erics...@op5.se

Oh lord, THANK YOU.  That appears to have fixed that problem, which
was a pain in the ass.  In my defense, I *did* see that option, but
the way I interpreted the comments didn't quite match up with the
behavior I was seeing.  I should have experimented with it, I guess.
A slight adjustment to the comments would have thrown a red flag for
me - perhaps this is just a matter of personal interpretation, but
maybe the comments could be a bit more specific:


diff -uNp nagios-updated.cfg nagios.cfg
--- nagios-updated.cfg  Sat May 25 09:05:09 2013
+++ nagios.cfg  Sat May 25 09:02:37 2013
@@ -981,9 +981,9 @@ translate_passive_host_checks=0

 # PASSIVE HOST CHECKS ARE SOFT OPTION
 # This determines whether or not Nagios will treat passive host
-# checks as being HARD or SOFT.  By default, a single passive host
-# check result will put a host into an immediate HARD state type.
-# This can be changed by enabling this option.
+# checks as being HARD or SOFT.  By default, a passive host check
+# result will put a host into a HARD state type.  This can be changed
+# by enabling this option.
 # Values: 0 = passive checks are HARD, 1 = passive checks are SOFT

 passive_host_checks_are_soft=0


Does that make sense?  If I had read something like that, it would
have been immediately clear to me what was happening.

Thank you so much, Andreas!  On to the next problem with the
upgrade (something that can wait until next week)...

Benny


-- 
"The very existence of flamethrowers proves that sometime, somewhere,
someone said to themselves, 'You know, I want to set those people
over there on fire, but I'm just not close enough to get the job
done.'"  -- George Carlin


--
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Nagios v3.5.0 transitioning immediately to a HARD state upon host problem

2013-05-24 Thread Andreas Ericsson

On 2013-05-23 17:43, C. Bensend wrote:
>
> Hey folks,
>
> I recently made two major changes to my Nagios environment:
>
> 1) I upgraded to v3.5.0.
> 2) I moved from a single server to two pollers sending passive
> results to one central console server.
>
> Now, this new distributed system was in place for several months
> while I tested, and it worked fine.  HOWEVER, since this was running
> in parallel with my production system, notifications were disabled.
> Hence, I didn't see this problem until I cut over for real and
> enabled notifications.
>
> (please excuse any cut-n-paste ugliness, had to send this info from
> my work account via Outlook and then try to cleanse and reformat
> via Squirrelmail)
>
> As a test and to capture information, I reboot 'hostname'.  This
> log is from the nagios-console host, which is the host that accepts
> the passive check results and sends notifications.  Here is the
> console host receiving a service check failure when the host is
> restarting:
>
> May 22 15:57:10 nagios-console nagios: SERVICE ALERT: hostname;/var disk
> queue;CRITICAL;SOFT;1;Connection refused by host
>
>
> So, the distributed poller system checks the host and sends its
> results to the console server:
>
> May 22 15:57:30 nagios-console nagios: HOST
> ALERT:hostname;DOWN;SOFT;1;CRITICAL - Host Unreachable (a.b.c.d)
>
>
> And then the centralized server IMMEDIATELY goes into a hard state,
> which triggers a  notification:
>
> May 22 15:57:30 nagios-console nagios: HOST ALERT:
> hostname;DOWN;HARD;1;CRITICAL - Host Unreachable (a.b.c.d)
> May 22 15:57:30 nagios-console nagios: HOST NOTIFICATION:
> cbensend;hostname;DOWN;host-notify-by-email-test;CRITICAL -
> Host Unreachable (a.b.c.d)
>
>
> Um.  Wat?  Why would the console immediately trigger a hard
> state? The config files don't support this decision.  And this
> IS a problem with the console server - the distributed monitors
> continue checking the host for 6 times like they should.  But
> for some reason, the centralized console just immediately
> calls it a hard state.
>
> Definitions on the distributed monitoring host (the one running
> the actual host and service checks for this host 'hostname':
>
> define host {
>   host_namehostname
>   aliasOld production Nagios server
>   address  a.b.c.d
>   action_url   /pnp4nagios/graph?host=$HOSTNAME$
>   icon_image_alt   Red Hat Linux
>   icon_image   redhat.png
>   statusmap_image  redhat.gd2
>   check_commandcheck-host-alive
>   check_period 24x7
>   notification_period  24x7
>   contact_groups   linux-infrastructure-admins
>   use  linux-host-template
> }
>
> The linux-host-template on that same system:
>
> define host {
>   name linux-host-template
>   register 0
>   max_check_attempts   6
>   check_interval   5
>   retry_interval   1
>   notification_interval360
>   notification_options d,r
>   active_checks_enabled1
>   passive_checks_enabled   1
>   notifications_enabled1
>   check_freshness  0
>   check_period 24x7
>   notification_period  24x7
>   check_commandcheck-host-alive
>   contact_groups   linux-infrastructure-admins
> }
>
> And said command to determine up or down:
>
> define command {
>   command_name check-host-alive
>   command_line $USER1$/check_ping -H $HOSTADDRESS$ -w
> 5000.0,80% -c 1.0,100% -p 5
> }
>
>
> Definitions on the centralized console host (the one that notifies):
>
> define host {
>host_namehostname
>aliasOld production Nagios server
>address  a.b.c.d
>action_url   /pnp4nagios/graph?host=$HOSTNAME$
>icon_image_alt   Red Hat Linux
>icon_image   redhat.png
>statusmap_image  redhat.gd2
>check_commandcheck-host-alive
>check_period 24x7
>notification_period  24x7
>contact_groups   linux-infrastructure-admins
>use  linux-host-template,Default_monitor_server
> }
>
> The "Default monitor server" template on the centralized server:
>
> define host {
>name Default_monitor_server
>register 0
>active_checks_enabled0
>passive_checks_enabled   1
>notifications_enabled1
>check_freshness  0
>freshness_threshold  86400
> }
>
> And the linux-host-template template on that same centralized host:
>
> define host {
> namelinux-host-template
> register0
> ma

Re: [Nagios-users] Nagios v3.5.0 transitioning immediately to a HARD state upon host problem

2013-05-23 Thread C. Bensend


> I ran into a similar problem, because my template set the service to "*
> is_volatile=1*".
>
> http://nagios.sourceforge.net/docs/3_0/volatileservices.html

Hrmmm.  Good point...

However, is_volatile does not appear in any of my configuration
files, for any of the Nagios servers.  It isn't set by default,
is it?  The Nagios "config.cgi" page doesn't even list it, and
livestatus (what I use to query my running daemon) doesn't give
it as a column it can query.  I can't imagine it's on by default
in v3.5.0, but I can't really tell if it is or not.

I can try explicitly *disabling* it in all hosts, but I can't
really test that at the moment - out of here for a long weekend
in a few minutes.  If it gets annoying enough over the weekend,
I might *have* to test that theory.

Thank you very much.  I will still appreciate any input others can
give on this question - it just doesn't seem to be behaving
as it's configured!

Benny


-- 
"The very existence of flamethrowers proves that sometime, somewhere,
someone said to themselves, 'You know, I want to set those people
over there on fire, but I'm just not close enough to get the job
done.'"  -- George Carlin


--
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Nagios v3.5.0 transitioning immediately to a HARD state upon host problem

2013-05-23 Thread Doug Eubanks

I ran into a similar problem, because my template set the service to "*
is_volatile=1*".

http://nagios.sourceforge.net/docs/3_0/volatileservices.html

Check to see if you have this flag enabled.

Doug

Sincerely,
Doug Eubanks
ad...@dougware.net
K1DUG
(919) 201-8750


On Thu, May 23, 2013 at 11:43 AM, C. Bensend  wrote:

>
> Hey folks,
>
>I recently made two major changes to my Nagios environment:
>
> 1) I upgraded to v3.5.0.
> 2) I moved from a single server to two pollers sending passive
>results to one central console server.
>
>Now, this new distributed system was in place for several months
> while I tested, and it worked fine.  HOWEVER, since this was running
> in parallel with my production system, notifications were disabled.
> Hence, I didn't see this problem until I cut over for real and
> enabled notifications.
>
> (please excuse any cut-n-paste ugliness, had to send this info from
> my work account via Outlook and then try to cleanse and reformat
> via Squirrelmail)
>
>As a test and to capture information, I reboot 'hostname'.  This
> log is from the nagios-console host, which is the host that accepts
> the passive check results and sends notifications.  Here is the
> console host receiving a service check failure when the host is
> restarting:
>
> May 22 15:57:10 nagios-console nagios: SERVICE ALERT: hostname;/var disk
> queue;CRITICAL;SOFT;1;Connection refused by host
>
>
> So, the distributed poller system checks the host and sends its
> results to the console server:
>
> May 22 15:57:30 nagios-console nagios: HOST
> ALERT:hostname;DOWN;SOFT;1;CRITICAL - Host Unreachable (a.b.c.d)
>
>
> And then the centralized server IMMEDIATELY goes into a hard state,
> which triggers a  notification:
>
> May 22 15:57:30 nagios-console nagios: HOST ALERT:
> hostname;DOWN;HARD;1;CRITICAL - Host Unreachable (a.b.c.d)
> May 22 15:57:30 nagios-console nagios: HOST NOTIFICATION:
> cbensend;hostname;DOWN;host-notify-by-email-test;CRITICAL -
> Host Unreachable (a.b.c.d)
>
>
>Um.  Wat?  Why would the console immediately trigger a hard
> state? The config files don't support this decision.  And this
> IS a problem with the console server - the distributed monitors
> continue checking the host for 6 times like they should.  But
> for some reason, the centralized console just immediately
> calls it a hard state.
>
>Definitions on the distributed monitoring host (the one running
> the actual host and service checks for this host 'hostname':
>
> define host {
>  host_namehostname
>  aliasOld production Nagios server
>  address  a.b.c.d
>  action_url   /pnp4nagios/graph?host=$HOSTNAME$
>  icon_image_alt   Red Hat Linux
>  icon_image   redhat.png
>  statusmap_image  redhat.gd2
>  check_commandcheck-host-alive
>  check_period 24x7
>  notification_period  24x7
>  contact_groups   linux-infrastructure-admins
>  use  linux-host-template
> }
>
> The linux-host-template on that same system:
>
> define host {
>  name linux-host-template
>  register 0
>  max_check_attempts   6
>  check_interval   5
>  retry_interval   1
>  notification_interval360
>  notification_options d,r
>  active_checks_enabled1
>  passive_checks_enabled   1
>  notifications_enabled1
>  check_freshness  0
>  check_period 24x7
>  notification_period  24x7
>  check_commandcheck-host-alive
>  contact_groups   linux-infrastructure-admins
> }
>
> And said command to determine up or down:
>
> define command {
>  command_name check-host-alive
>  command_line $USER1$/check_ping -H $HOSTADDRESS$ -w
> 5000.0,80% -c 1.0,100% -p 5
> }
>
>
> Definitions on the centralized console host (the one that notifies):
>
> define host {
>   host_namehostname
>   aliasOld production Nagios server
>   address  a.b.c.d
>   action_url   /pnp4nagios/graph?host=$HOSTNAME$
>   icon_image_alt   Red Hat Linux
>   icon_image   redhat.png
>   statusmap_image  redhat.gd2
>   check_commandcheck-host-alive
>   check_period 24x7
>   notification_period  24x7
>   contact_groups   linux-infrastructure-admins
>   use  linux-host-template,Default_monitor_server
> }
>
> The "Default monitor server" template on the centralized server:
>
> define host {
>   name Default_monitor_server
>   register 0
>   active_checks_enabled0
>   passive_checks_enabled   1
>   notifications_enabled1
>   check_freshness  0
>

[Nagios-users] Nagios v3.5.0 transitioning immediately to a HARD state upon host problem

2013-05-23 Thread C. Bensend


Hey folks,

   I recently made two major changes to my Nagios environment:

1) I upgraded to v3.5.0.
2) I moved from a single server to two pollers sending passive
   results to one central console server.

   Now, this new distributed system was in place for several months
while I tested, and it worked fine.  HOWEVER, since this was running
in parallel with my production system, notifications were disabled.
Hence, I didn't see this problem until I cut over for real and
enabled notifications.

(please excuse any cut-n-paste ugliness, had to send this info from
my work account via Outlook and then try to cleanse and reformat
via Squirrelmail)

   As a test and to capture information, I reboot 'hostname'.  This
log is from the nagios-console host, which is the host that accepts
the passive check results and sends notifications.  Here is the
console host receiving a service check failure when the host is
restarting:

May 22 15:57:10 nagios-console nagios: SERVICE ALERT: hostname;/var disk
queue;CRITICAL;SOFT;1;Connection refused by host


So, the distributed poller system checks the host and sends its
results to the console server:

May 22 15:57:30 nagios-console nagios: HOST
ALERT:hostname;DOWN;SOFT;1;CRITICAL - Host Unreachable (a.b.c.d)


And then the centralized server IMMEDIATELY goes into a hard state,
which triggers a  notification:

May 22 15:57:30 nagios-console nagios: HOST ALERT:
hostname;DOWN;HARD;1;CRITICAL - Host Unreachable (a.b.c.d)
May 22 15:57:30 nagios-console nagios: HOST NOTIFICATION:
cbensend;hostname;DOWN;host-notify-by-email-test;CRITICAL -
Host Unreachable (a.b.c.d)


   Um.  Wat?  Why would the console immediately trigger a hard
state? The config files don't support this decision.  And this
IS a problem with the console server - the distributed monitors
continue checking the host for 6 times like they should.  But
for some reason, the centralized console just immediately
calls it a hard state.

   Definitions on the distributed monitoring host (the one running
the actual host and service checks for this host 'hostname':

define host {
 host_namehostname
 aliasOld production Nagios server
 address  a.b.c.d
 action_url   /pnp4nagios/graph?host=$HOSTNAME$
 icon_image_alt   Red Hat Linux
 icon_image   redhat.png
 statusmap_image  redhat.gd2
 check_commandcheck-host-alive
 check_period 24x7
 notification_period  24x7
 contact_groups   linux-infrastructure-admins
 use  linux-host-template
}

The linux-host-template on that same system:

define host {
 name linux-host-template
 register 0
 max_check_attempts   6
 check_interval   5
 retry_interval   1
 notification_interval360
 notification_options d,r
 active_checks_enabled1
 passive_checks_enabled   1
 notifications_enabled1
 check_freshness  0
 check_period 24x7
 notification_period  24x7
 check_commandcheck-host-alive
 contact_groups   linux-infrastructure-admins
}

And said command to determine up or down:

define command {
 command_name check-host-alive
 command_line $USER1$/check_ping -H $HOSTADDRESS$ -w
5000.0,80% -c 1.0,100% -p 5
}


Definitions on the centralized console host (the one that notifies):

define host {
  host_namehostname
  aliasOld production Nagios server
  address  a.b.c.d
  action_url   /pnp4nagios/graph?host=$HOSTNAME$
  icon_image_alt   Red Hat Linux
  icon_image   redhat.png
  statusmap_image  redhat.gd2
  check_commandcheck-host-alive
  check_period 24x7
  notification_period  24x7
  contact_groups   linux-infrastructure-admins
  use  linux-host-template,Default_monitor_server
}

The "Default monitor server" template on the centralized server:

define host {
  name Default_monitor_server
  register 0
  active_checks_enabled0
  passive_checks_enabled   1
  notifications_enabled1
  check_freshness  0
  freshness_threshold  86400
}

And the linux-host-template template on that same centralized host:

define host {
   namelinux-host-template
   register0
   max_check_attempts  6
   check_interval  5
   retry_interval  1
   notification_interval   360
   notification_optionsd,r
   active_checks_enabled   1
   passive_checks_enabled  1
   notifications_enabled   1
   check_freshness 0
   check_period24x7
   not

Re: [Nagios-users] Nagios v3.5.0 transitioning immediately to a HARD state upon host problem

Re: [Nagios-users] Nagios v3.5.0 transitioning immediately to a HARD state upon host problem

Re: [Nagios-users] Nagios v3.5.0 transitioning immediately to a HARD state upon host problem

Re: [Nagios-users] Nagios v3.5.0 transitioning immediately to a HARD state upon host problem

Re: [Nagios-users] Nagios v3.5.0 transitioning immediately to a HARD state upon host problem

[Nagios-users] Nagios v3.5.0 transitioning immediately to a HARD state upon host problem

6 matches

Site Navigation

Mail list logo

Footer information