Re: [Nagios-users] high host latency on nagios master
try lowering max_check_result_reaper value I had good luck playing with that value. Thanks On Tue, May 4, 2010 at 8:13 PM, Trisha Hoang tri...@rockyou.com wrote: Hi, The nagios *master *got really high host latency and I'm not sure how to tweak it. I ran the check_ping plugin on a handful of hosts and the rta averaged at 0.2 second so it's not the network. *Environment:* - 565 hosts - 6790 passive checks from the slaves - not using event broker - master server *actively* executes the hosts checks every 5 minutes and *passively *processes checks every 1 minute - not doing performance data *Nagiostats* Nagios Stats 3.2.1 Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org) Last Modified: 03-09-2010 License: GPL CURRENT STATUS DATA -- Status File:/var/log/nagios/status.dat Status File Age:0d 0h 0m 23s Status File Version:3.2.1 Program Running Time: 0d 1h 32m 19s Nagios PID: 28282 Used/High/Total Command Buffers:1316 / 3066 / 4096 Total Services: 7745 Services Checked: 7745 Services Scheduled: 1381 Services Actively Checked: 955 Services Passively Checked: 6790 Total Service State Change: 0.000 / 9.740 / 0.007 % Active Service Latency: 18.948 / 205.144 / 165.751 sec Active Service Execution Time: 0.007 / 9.051 / 0.055 sec Active Service State Change:0.000 / 5.460 / 0.006 % Active Services Last 1/5/15/60 min: 0 / 0 / 0 / 0 Passive Service Latency:34.359 / 190.247 / 76.739 sec Passive Service State Change: 0.000 / 9.740 / 0.008 % Passive Services Last 1/5/15/60 min:0 / 3054 / 6774 / 6784 Services Ok/Warn/Unk/Crit: 7720 / 1 / 0 / 24 Services Flapping: 27 Services In Downtime: 0 Total Hosts:566 Hosts Checked: 566 Hosts Scheduled:566 Hosts Actively Checked: 566 Host Passively Checked: 0 Total Host State Change:0.000 / 0.000 / 0.000 % Active Host Latency:0.000 / 3410.087 / 2413.051 sec Active Host Execution Time: 0.007 / 10.010 / 0.063 sec Active Host State Change: 0.000 / 0.000 / 0.000 % Active Hosts Last 1/5/15/60 min:0 / 8 / 10 / 565 Passive Host Latency: 0.000 / 0.000 / 0.000 sec Passive Host State Change: 0.000 / 0.000 / 0.000 % Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0 Hosts Up/Down/Unreach: 563 / 3 / 0 Hosts Flapping: 1 Hosts In Downtime: 0 Active Host Checks Last 1/5/15 min: 5 / 32 / 75 Scheduled: 0 / 0 / 0 On-demand: 5 / 32 / 75 Parallel:1 / 11 / 23 Serial: 0 / 0 / 0 Cached: 4 / 21 / 52 Passive Host Checks Last 1/5/15 min:0 / 0 / 0 Active Service Checks Last 1/5/15 min: 0 / 0 / 0 Scheduled: 0 / 0 / 0 On-demand: 0 / 0 / 0 Cached: 0 / 0 / 0 Passive Service Checks Last 1/5/15 min: 2 / 1455 / 1455 External Commands Last 1/5/15 min: 1302 / 6063 / 20253 *Nagios.cfg* # EXTERNAL COMMAND CHECK INTERVAL # This is the interval at which Nagios should check for external commands. # This value works of the interval_length you specify later. If you leave # that at its default value of 60 (seconds), a value of 1 here will cause # Nagios to check for external commands every minute. If you specify a # number followed by an s (i.e. 15s), this will be interpreted to mean # actual seconds rather than a multiple of the interval_length variable. # Note: In addition to reading the external command file at regularly # scheduled intervals, Nagios will also check for external commands after # event handlers are executed. # NOTE: Setting this value to -1 causes Nagios to check the external # command file as often as possible. #command_check_interval=15s command_check_interval=-1 # SERVICE INTER-CHECK DELAY METHOD # This is the method that Nagios should use when initially # spreading out service checks when it starts monitoring. The # default is to use smart delay calculation, which will try to # space all service checks out evenly to minimize CPU load. # Using the dumb setting will cause all checks to be scheduled # at the same time (with no delay between them)! This is not a # good thing for production, but is useful when testing the # parallelization functionality. # n =
[Nagios-users] high host latency on nagios master
Hi, The nagios *master *got really high host latency and I'm not sure how to tweak it. I ran the check_ping plugin on a handful of hosts and the rta averaged at 0.2 second so it's not the network. *Environment:* - 565 hosts - 6790 passive checks from the slaves - not using event broker - master server *actively* executes the hosts checks every 5 minutes and *passively *processes checks every 1 minute - not doing performance data *Nagiostats* Nagios Stats 3.2.1 Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org) Last Modified: 03-09-2010 License: GPL CURRENT STATUS DATA -- Status File:/var/log/nagios/status.dat Status File Age:0d 0h 0m 23s Status File Version:3.2.1 Program Running Time: 0d 1h 32m 19s Nagios PID: 28282 Used/High/Total Command Buffers:1316 / 3066 / 4096 Total Services: 7745 Services Checked: 7745 Services Scheduled: 1381 Services Actively Checked: 955 Services Passively Checked: 6790 Total Service State Change: 0.000 / 9.740 / 0.007 % Active Service Latency: 18.948 / 205.144 / 165.751 sec Active Service Execution Time: 0.007 / 9.051 / 0.055 sec Active Service State Change:0.000 / 5.460 / 0.006 % Active Services Last 1/5/15/60 min: 0 / 0 / 0 / 0 Passive Service Latency:34.359 / 190.247 / 76.739 sec Passive Service State Change: 0.000 / 9.740 / 0.008 % Passive Services Last 1/5/15/60 min:0 / 3054 / 6774 / 6784 Services Ok/Warn/Unk/Crit: 7720 / 1 / 0 / 24 Services Flapping: 27 Services In Downtime: 0 Total Hosts:566 Hosts Checked: 566 Hosts Scheduled:566 Hosts Actively Checked: 566 Host Passively Checked: 0 Total Host State Change:0.000 / 0.000 / 0.000 % Active Host Latency:0.000 / 3410.087 / 2413.051 sec Active Host Execution Time: 0.007 / 10.010 / 0.063 sec Active Host State Change: 0.000 / 0.000 / 0.000 % Active Hosts Last 1/5/15/60 min:0 / 8 / 10 / 565 Passive Host Latency: 0.000 / 0.000 / 0.000 sec Passive Host State Change: 0.000 / 0.000 / 0.000 % Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0 Hosts Up/Down/Unreach: 563 / 3 / 0 Hosts Flapping: 1 Hosts In Downtime: 0 Active Host Checks Last 1/5/15 min: 5 / 32 / 75 Scheduled: 0 / 0 / 0 On-demand: 5 / 32 / 75 Parallel:1 / 11 / 23 Serial: 0 / 0 / 0 Cached: 4 / 21 / 52 Passive Host Checks Last 1/5/15 min:0 / 0 / 0 Active Service Checks Last 1/5/15 min: 0 / 0 / 0 Scheduled: 0 / 0 / 0 On-demand: 0 / 0 / 0 Cached: 0 / 0 / 0 Passive Service Checks Last 1/5/15 min: 2 / 1455 / 1455 External Commands Last 1/5/15 min: 1302 / 6063 / 20253 *Nagios.cfg* # EXTERNAL COMMAND CHECK INTERVAL # This is the interval at which Nagios should check for external commands. # This value works of the interval_length you specify later. If you leave # that at its default value of 60 (seconds), a value of 1 here will cause # Nagios to check for external commands every minute. If you specify a # number followed by an s (i.e. 15s), this will be interpreted to mean # actual seconds rather than a multiple of the interval_length variable. # Note: In addition to reading the external command file at regularly # scheduled intervals, Nagios will also check for external commands after # event handlers are executed. # NOTE: Setting this value to -1 causes Nagios to check the external # command file as often as possible. #command_check_interval=15s command_check_interval=-1 # SERVICE INTER-CHECK DELAY METHOD # This is the method that Nagios should use when initially # spreading out service checks when it starts monitoring. The # default is to use smart delay calculation, which will try to # space all service checks out evenly to minimize CPU load. # Using the dumb setting will cause all checks to be scheduled # at the same time (with no delay between them)! This is not a # good thing for production, but is useful when testing the # parallelization functionality. # n = None - don't use any delay between checks # d = Use a dumb delay of 1 second between checks # s = Use smart inter-check delay calculation # x.xx= Use an inter-check delay of x.xx seconds service_inter_check_delay_method=s #