Re: [Nagios-users] Suppress "Max concurrent service checks" messages.
On 11/12/2010 06:40 PM, Paul M. Dubuc wrote: > > Andreas, I know it's doing things "wrong", but there's not much I can do about > it right now. Since I know what the problem is that these messages are trying > to tell me. I'd just like to keep them from flooding the logs so I can see > what else is happening more easily. That's all. > You could always run Nagios in the foreground and redirect the log through a grep -v filter, restarting it on midnight every night and rotating logs manually. It's not difficult. Just cumbersome. So long as you're aware that whatever you conclude from your tests will be more than just a little off wrt what you wanted to determine, you'll almost certainly do alright though. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Suppress "Max concurrent service checks" messages.
On 11/12/2010 06:50 PM, Paul M. Dubuc wrote: > Ton Voon wrote: > ... >> >> The trouble with the way the nudging works is that it hides the fact >> that you have latency issues (as the check is rescheduled to a future >> time). This means nagiostats will not include the additional latency >> time here. >> >> If someone has a better way of working this out, I'm all ears. > > Would it cause other problems if the total nudging time for a service were > included in its latency time? > Not really. It would just be a much more obvious concern. This is something we'll look into implementing when we're doing Nagios 4 though, as it's painfully difficult to do without altering the object structure of hosts and services. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Suppress "Max concurrent service checks" messages.
On 11/12/2010 06:03 PM, Ton Voon wrote: > > On 12 Nov 2010, at 15:30, Paul M. Dubuc wrote: > >> We're running Nagios 3.2.3 with concurrent service checks set to >> 40. We can't >> go much higher than this due to resource constraints outside of >> Nagios but >> we're running 329 services at 5 minute intervals (this is a "load >> test" of >> sorts not production load ... yet). Average execution time/latency >> is 36/11 >> seconds so we're seeing quite a few messages like this in the Nagios >> log file: >> >> (Informational Message) [11-11-2010 14:55:57] Max concurrent service >> checks >> (40) has been reached. Nudging: by 9 seconds... >> >> Is there any way to suppress these messages from being logged? I >> don't see an >> option for logging these in the config file documentation. > > I put those messages in. > > Firstly, 40 doesn't necessarily mean there are 40 concurrent service > checks running as they may have finished but not been reaped yet (to > decrement the counter). > > Secondly, if you are getting these messages, then either (1) this > limit is too low - increase and keep an eye of the load on your nagios > server; (2) you've got too many checks running - reduce frequencies/ > numbers or setup a slave server. > > The trouble with the way the nudging works is that it hides the fact > that you have latency issues (as the check is rescheduled to a future > time). This means nagiostats will not include the additional latency > time here. > > If someone has a better way of working this out, I'm all ears. > We could use something like pnp4nagios does, and issue a check to make sure load is below a certain threshold before firing off new checks. There's a (reasonably) portable way of getting the number of online CPU's, so we could even make an educated guess at how many checks we can run to saturate the CPU's while still not running too many checks. Ofcourse, some checks are more heavy-duty than others. As a first stab at maintaining reasonable load, we should probably ignore that. At a later point, we might want to introduce "probably load increase of running this check" and nudge checks into the future when we're in danger of load / num_cpus > 0.9 or some other suitable number. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Suppress "Max concurrent service checks" messages.
Ton Voon wrote: ... > > The trouble with the way the nudging works is that it hides the fact > that you have latency issues (as the check is rescheduled to a future > time). This means nagiostats will not include the additional latency > time here. > > If someone has a better way of working this out, I'm all ears. Would it cause other problems if the total nudging time for a service were included in its latency time? -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Suppress "Max concurrent service checks" messages.
Ton Voon wrote: > > On 12 Nov 2010, at 15:30, Paul M. Dubuc wrote: > >> We're running Nagios 3.2.3 with concurrent service checks set to >> 40. We can't >> go much higher than this due to resource constraints outside of >> Nagios but >> we're running 329 services at 5 minute intervals (this is a "load >> test" of >> sorts not production load ... yet). Average execution time/latency >> is 36/11 >> seconds so we're seeing quite a few messages like this in the Nagios >> log file: >> >> (Informational Message) [11-11-2010 14:55:57] Max concurrent service >> checks >> (40) has been reached. Nudging: by 9 seconds... >> >> Is there any way to suppress these messages from being logged? I >> don't see an >> option for logging these in the config file documentation. > > I put those messages in. > > Firstly, 40 doesn't necessarily mean there are 40 concurrent service > checks running as they may have finished but not been reaped yet (to > decrement the counter). > > Secondly, if you are getting these messages, then either (1) this > limit is too low - increase and keep an eye of the load on your nagios > server; (2) you've got too many checks running - reduce frequencies/ > numbers or setup a slave server. > > The trouble with the way the nudging works is that it hides the fact > that you have latency issues (as the check is rescheduled to a future > time). This means nagiostats will not include the additional latency > time here. > > If someone has a better way of working this out, I'm all ears. > > Ton Thanks, Ton. This is helpful information and advice. The services we're running require web browsers to run which are a cpu and memory intensive resource that, temporarily, we need to manage on the Nagios server. In production we shouldn't have these limitations, but for now I just wanted to keep all these messages from flooding the log. Andreas, I know it's doing things "wrong", but there's not much I can do about it right now. Since I know what the problem is that these messages are trying to tell me. I'd just like to keep them from flooding the logs so I can see what else is happening more easily. That's all. Thanks, Paul Dubuc -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Suppress "Max concurrent service checks" messages.
On 12 Nov 2010, at 15:30, Paul M. Dubuc wrote: > We're running Nagios 3.2.3 with concurrent service checks set to > 40. We can't > go much higher than this due to resource constraints outside of > Nagios but > we're running 329 services at 5 minute intervals (this is a "load > test" of > sorts not production load ... yet). Average execution time/latency > is 36/11 > seconds so we're seeing quite a few messages like this in the Nagios > log file: > > (Informational Message) [11-11-2010 14:55:57] Max concurrent service > checks > (40) has been reached. Nudging : by 9 seconds... > > Is there any way to suppress these messages from being logged? I > don't see an > option for logging these in the config file documentation. I put those messages in. Firstly, 40 doesn't necessarily mean there are 40 concurrent service checks running as they may have finished but not been reaped yet (to decrement the counter). Secondly, if you are getting these messages, then either (1) this limit is too low - increase and keep an eye of the load on your nagios server; (2) you've got too many checks running - reduce frequencies/ numbers or setup a slave server. The trouble with the way the nudging works is that it hides the fact that you have latency issues (as the check is rescheduled to a future time). This means nagiostats will not include the additional latency time here. If someone has a better way of working this out, I'm all ears. Ton -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Suppress "Max concurrent service checks" messages.
On 11/12/2010 04:30 PM, Paul M. Dubuc wrote: > We're running Nagios 3.2.3 with concurrent service checks set to 40. We can't > go much higher than this due to resource constraints outside of Nagios but > we're running 329 services at 5 minute intervals (this is a "load test" of > sorts not production load ... yet). Average execution time/latency is 36/11 > seconds so we're seeing quite a few messages like this in the Nagios log file: > If you're doing a "load test" on a system that clearly doesn't handle production load and thus forces you to run with less than optimal settings, you're doing things wrong. > (Informational Message) [11-11-2010 14:55:57] Max concurrent service checks > (40) has been reached. Nudging: by 9 seconds... > > Is there any way to suppress these messages from being logged? I don't see an > option for logging these in the config file documentation. > Not really, no. See my previous comment though. It's equally valid now, even though about 12 seconds have passed since I wrote it. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Suppress "Max concurrent service checks" messages.
We're running Nagios 3.2.3 with concurrent service checks set to 40. We can't go much higher than this due to resource constraints outside of Nagios but we're running 329 services at 5 minute intervals (this is a "load test" of sorts not production load ... yet). Average execution time/latency is 36/11 seconds so we're seeing quite a few messages like this in the Nagios log file: (Informational Message) [11-11-2010 14:55:57] Max concurrent service checks (40) has been reached. Nudging : by 9 seconds... Is there any way to suppress these messages from being logged? I don't see an option for logging these in the config file documentation. Thanks, Paul Dubuc -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null