Re: [Nagios-users] high host latency on nagios master

2010-05-06 Thread shadih rahman
try lowering max_check_result_reaper value  I had good luck playing with
that value.  Thanks

On Tue, May 4, 2010 at 8:13 PM, Trisha Hoang tri...@rockyou.com wrote:

 Hi,
 The nagios *master *got really high host latency and I'm not sure how to
 tweak it. I ran the check_ping plugin on a handful of hosts and the rta
 averaged at 0.2 second so it's not the network.

 *Environment:*
 - 565 hosts
 - 6790 passive checks from the slaves
 - not using event broker
 - master server *actively* executes the hosts checks every 5 minutes and 
 *passively
 *processes checks every 1 minute
 - not doing performance data

 *Nagiostats*

 Nagios Stats 3.2.1
 Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
 Last Modified: 03-09-2010
 License: GPL

 CURRENT STATUS DATA
 --
 Status File:/var/log/nagios/status.dat
 Status File Age:0d 0h 0m 23s
 Status File Version:3.2.1

 Program Running Time:   0d 1h 32m 19s
 Nagios PID: 28282
 Used/High/Total Command Buffers:1316 / 3066 / 4096

 Total Services: 7745
 Services Checked:   7745
 Services Scheduled: 1381
 Services Actively Checked:  955
 Services Passively Checked: 6790
 Total Service State Change: 0.000 / 9.740 / 0.007 %
 Active Service Latency: 18.948 / 205.144 / 165.751 sec
 Active Service Execution Time:  0.007 / 9.051 / 0.055 sec
 Active Service State Change:0.000 / 5.460 / 0.006 %
 Active Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
 Passive Service Latency:34.359 / 190.247 / 76.739 sec
 Passive Service State Change:   0.000 / 9.740 / 0.008 %
 Passive Services Last 1/5/15/60 min:0 / 3054 / 6774 / 6784
 Services Ok/Warn/Unk/Crit:  7720 / 1 / 0 / 24
 Services Flapping:  27
 Services In Downtime:   0

 Total Hosts:566
 Hosts Checked:  566
 Hosts Scheduled:566
 Hosts Actively Checked: 566
 Host Passively Checked: 0
 Total Host State Change:0.000 / 0.000 / 0.000 %
 Active Host Latency:0.000 / 3410.087 / 2413.051 sec
 Active Host Execution Time: 0.007 / 10.010 / 0.063 sec
 Active Host State Change:   0.000 / 0.000 / 0.000 %
 Active Hosts Last 1/5/15/60 min:0 / 8 / 10 / 565
 Passive Host Latency:   0.000 / 0.000 / 0.000 sec
 Passive Host State Change:  0.000 / 0.000 / 0.000 %
 Passive Hosts Last 1/5/15/60 min:   0 / 0 / 0 / 0
 Hosts Up/Down/Unreach:  563 / 3 / 0
 Hosts Flapping: 1
 Hosts In Downtime:  0

 Active Host Checks Last 1/5/15 min: 5 / 32 / 75
Scheduled:   0 / 0 / 0
On-demand:   5 / 32 / 75
Parallel:1 / 11 / 23
Serial:  0 / 0 / 0
Cached:  4 / 21 / 52
 Passive Host Checks Last 1/5/15 min:0 / 0 / 0
 Active Service Checks Last 1/5/15 min:  0 / 0 / 0
Scheduled:   0 / 0 / 0
On-demand:   0 / 0 / 0
Cached:  0 / 0 / 0
 Passive Service Checks Last 1/5/15 min: 2 / 1455 / 1455

 External Commands Last 1/5/15 min:  1302 / 6063 / 20253


 *Nagios.cfg*

 # EXTERNAL COMMAND CHECK INTERVAL
 # This is the interval at which Nagios should check for external commands.
 # This value works of the interval_length you specify later.  If you leave
 # that at its default value of 60 (seconds), a value of 1 here will cause
 # Nagios to check for external commands every minute.  If you specify a
 # number followed by an s (i.e. 15s), this will be interpreted to mean
 # actual seconds rather than a multiple of the interval_length variable.
 # Note: In addition to reading the external command file at regularly
 # scheduled intervals, Nagios will also check for external commands after
 # event handlers are executed.
 # NOTE: Setting this value to -1 causes Nagios to check the external
 # command file as often as possible.

 #command_check_interval=15s
 command_check_interval=-1

 # SERVICE INTER-CHECK DELAY METHOD
 # This is the method that Nagios should use when initially
 # spreading out service checks when it starts monitoring.  The
 # default is to use smart delay calculation, which will try to
 # space all service checks out evenly to minimize CPU load.
 # Using the dumb setting will cause all checks to be scheduled
 # at the same time (with no delay between them)!  This is not a
 # good thing for production, but is useful when testing the
 # parallelization functionality.
 #   n   = 

[Nagios-users] high host latency on nagios master

2010-05-04 Thread Trisha Hoang
Hi,
The nagios *master *got really high host latency and I'm not sure how to
tweak it. I ran the check_ping plugin on a handful of hosts and the rta
averaged at 0.2 second so it's not the network.

*Environment:*
- 565 hosts
- 6790 passive checks from the slaves
- not using event broker
- master server *actively* executes the hosts checks every 5 minutes
and *passively
*processes checks every 1 minute
- not doing performance data

*Nagiostats*

Nagios Stats 3.2.1
Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
Last Modified: 03-09-2010
License: GPL

CURRENT STATUS DATA
--
Status File:/var/log/nagios/status.dat
Status File Age:0d 0h 0m 23s
Status File Version:3.2.1

Program Running Time:   0d 1h 32m 19s
Nagios PID: 28282
Used/High/Total Command Buffers:1316 / 3066 / 4096

Total Services: 7745
Services Checked:   7745
Services Scheduled: 1381
Services Actively Checked:  955
Services Passively Checked: 6790
Total Service State Change: 0.000 / 9.740 / 0.007 %
Active Service Latency: 18.948 / 205.144 / 165.751 sec
Active Service Execution Time:  0.007 / 9.051 / 0.055 sec
Active Service State Change:0.000 / 5.460 / 0.006 %
Active Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
Passive Service Latency:34.359 / 190.247 / 76.739 sec
Passive Service State Change:   0.000 / 9.740 / 0.008 %
Passive Services Last 1/5/15/60 min:0 / 3054 / 6774 / 6784
Services Ok/Warn/Unk/Crit:  7720 / 1 / 0 / 24
Services Flapping:  27
Services In Downtime:   0

Total Hosts:566
Hosts Checked:  566
Hosts Scheduled:566
Hosts Actively Checked: 566
Host Passively Checked: 0
Total Host State Change:0.000 / 0.000 / 0.000 %
Active Host Latency:0.000 / 3410.087 / 2413.051 sec
Active Host Execution Time: 0.007 / 10.010 / 0.063 sec
Active Host State Change:   0.000 / 0.000 / 0.000 %
Active Hosts Last 1/5/15/60 min:0 / 8 / 10 / 565
Passive Host Latency:   0.000 / 0.000 / 0.000 sec
Passive Host State Change:  0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min:   0 / 0 / 0 / 0
Hosts Up/Down/Unreach:  563 / 3 / 0
Hosts Flapping: 1
Hosts In Downtime:  0

Active Host Checks Last 1/5/15 min: 5 / 32 / 75
   Scheduled:   0 / 0 / 0
   On-demand:   5 / 32 / 75
   Parallel:1 / 11 / 23
   Serial:  0 / 0 / 0
   Cached:  4 / 21 / 52
Passive Host Checks Last 1/5/15 min:0 / 0 / 0
Active Service Checks Last 1/5/15 min:  0 / 0 / 0
   Scheduled:   0 / 0 / 0
   On-demand:   0 / 0 / 0
   Cached:  0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 2 / 1455 / 1455

External Commands Last 1/5/15 min:  1302 / 6063 / 20253


*Nagios.cfg*

# EXTERNAL COMMAND CHECK INTERVAL
# This is the interval at which Nagios should check for external commands.
# This value works of the interval_length you specify later.  If you leave
# that at its default value of 60 (seconds), a value of 1 here will cause
# Nagios to check for external commands every minute.  If you specify a
# number followed by an s (i.e. 15s), this will be interpreted to mean
# actual seconds rather than a multiple of the interval_length variable.
# Note: In addition to reading the external command file at regularly
# scheduled intervals, Nagios will also check for external commands after
# event handlers are executed.
# NOTE: Setting this value to -1 causes Nagios to check the external
# command file as often as possible.

#command_check_interval=15s
command_check_interval=-1

# SERVICE INTER-CHECK DELAY METHOD
# This is the method that Nagios should use when initially
# spreading out service checks when it starts monitoring.  The
# default is to use smart delay calculation, which will try to
# space all service checks out evenly to minimize CPU load.
# Using the dumb setting will cause all checks to be scheduled
# at the same time (with no delay between them)!  This is not a
# good thing for production, but is useful when testing the
# parallelization functionality.
#   n   = None - don't use any delay between checks
#   d   = Use a dumb delay of 1 second between checks
#   s   = Use smart inter-check delay calculation
#   x.xx= Use an inter-check delay of x.xx seconds

service_inter_check_delay_method=s

#