Re: [Nagios-users] Suppress "Max concurrent service checks" messages.

2010-11-12 Thread Andreas Ericsson
On 11/12/2010 06:40 PM, Paul M. Dubuc wrote:
> 
> Andreas, I know it's doing things "wrong", but there's not much I can do about
> it right now.  Since I know what the problem is that these messages are trying
> to tell me.  I'd just like to keep them from flooding the logs so I can see
> what else is happening more easily.  That's all.
> 

You could always run Nagios in the foreground and redirect the log through a
grep -v filter, restarting it on midnight every night and rotating logs
manually. It's not difficult. Just cumbersome.

So long as you're aware that whatever you conclude from your tests will be
more than just a little off wrt what you wanted to determine, you'll almost
certainly do alright though.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Suppress "Max concurrent service checks" messages.

2010-11-12 Thread Andreas Ericsson
On 11/12/2010 06:50 PM, Paul M. Dubuc wrote:
> Ton Voon wrote:
> ...
>>
>> The trouble with the way the nudging works is that it hides the fact
>> that you have latency issues (as the check is rescheduled to a future
>> time). This means nagiostats will not include the additional latency
>> time here.
>>
>> If someone has a better way of working this out, I'm all ears.
> 
> Would it cause other problems if the total nudging time for a service were
> included in its latency time?
> 

Not really. It would just be a much more obvious concern. This is something
we'll look into implementing when we're doing Nagios 4 though, as it's
painfully difficult to do without altering the object structure of hosts
and services.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Suppress "Max concurrent service checks" messages.

2010-11-12 Thread Andreas Ericsson
On 11/12/2010 06:03 PM, Ton Voon wrote:
> 
> On 12 Nov 2010, at 15:30, Paul M. Dubuc wrote:
> 
>> We're running Nagios 3.2.3 with concurrent service checks set to
>> 40.  We can't
>> go much higher than this due to resource constraints outside of
>> Nagios but
>> we're running 329 services at 5 minute intervals (this is a "load
>> test" of
>> sorts not production load ... yet).  Average execution time/latency
>> is 36/11
>> seconds so we're seeing quite a few messages like this in the Nagios
>> log file:
>>
>> (Informational Message) [11-11-2010 14:55:57] Max concurrent service
>> checks
>> (40) has been reached. Nudging:  by 9 seconds...
>>
>> Is there any way to suppress these messages from being logged?  I
>> don't see an
>> option for logging these in the config file documentation.
> 
> I put those messages in.
> 
> Firstly, 40 doesn't necessarily mean there are 40 concurrent service
> checks running as they may have finished but not been reaped yet (to
> decrement the counter).
> 
> Secondly, if you are getting these messages, then either (1) this
> limit is too low - increase and keep an eye of the load on your nagios
> server; (2) you've got too many checks running - reduce frequencies/
> numbers or setup a slave server.
> 
> The trouble with the way the nudging works is that it hides the fact
> that you have latency issues (as the check is rescheduled to a future
> time). This means nagiostats will not include the additional latency
> time here.
> 
> If someone has a better way of working this out, I'm all ears.
> 

We could use something like pnp4nagios does, and issue a check to make
sure load is below a certain threshold before firing off new checks.
There's a (reasonably) portable way of getting the number of online
CPU's, so we could even make an educated guess at how many checks we
can run to saturate the CPU's while still not running too many checks.

Ofcourse, some checks are more heavy-duty than others. As a first stab
at maintaining reasonable load, we should probably ignore that. At a
later point, we might want to introduce "probably load increase of
running this check" and nudge checks into the future when we're in
danger of load / num_cpus > 0.9 or some other suitable number.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Suppress "Max concurrent service checks" messages.

2010-11-12 Thread Paul M. Dubuc
Ton Voon wrote:
...
>
> The trouble with the way the nudging works is that it hides the fact
> that you have latency issues (as the check is rescheduled to a future
> time). This means nagiostats will not include the additional latency
> time here.
>
> If someone has a better way of working this out, I'm all ears.

Would it cause other problems if the total nudging time for a service were 
included in its latency time?

--
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Suppress "Max concurrent service checks" messages.

2010-11-12 Thread Paul M. Dubuc
Ton Voon wrote:
>
> On 12 Nov 2010, at 15:30, Paul M. Dubuc wrote:
>
>> We're running Nagios 3.2.3 with concurrent service checks set to
>> 40.  We can't
>> go much higher than this due to resource constraints outside of
>> Nagios but
>> we're running 329 services at 5 minute intervals (this is a "load
>> test" of
>> sorts not production load ... yet).  Average execution time/latency
>> is 36/11
>> seconds so we're seeing quite a few messages like this in the Nagios
>> log file:
>>
>> (Informational Message) [11-11-2010 14:55:57] Max concurrent service
>> checks
>> (40) has been reached. Nudging:  by 9 seconds...
>>
>> Is there any way to suppress these messages from being logged?  I
>> don't see an
>> option for logging these in the config file documentation.
>
> I put those messages in.
>
> Firstly, 40 doesn't necessarily mean there are 40 concurrent service
> checks running as they may have finished but not been reaped yet (to
> decrement the counter).
>
> Secondly, if you are getting these messages, then either (1) this
> limit is too low - increase and keep an eye of the load on your nagios
> server; (2) you've got too many checks running - reduce frequencies/
> numbers or setup a slave server.
>
> The trouble with the way the nudging works is that it hides the fact
> that you have latency issues (as the check is rescheduled to a future
> time). This means nagiostats will not include the additional latency
> time here.
>
> If someone has a better way of working this out, I'm all ears.
>
> Ton

Thanks, Ton.  This is helpful information and advice.  The services we're 
running require web browsers to run which are a cpu and memory intensive 
resource that, temporarily, we need to manage on the Nagios server.  In 
production we shouldn't have these limitations, but for now I just wanted to 
keep all these messages from flooding the log.

Andreas, I know it's doing things "wrong", but there's not much I can do about 
it right now.  Since I know what the problem is that these messages are trying 
to tell me.  I'd just like to keep them from flooding the logs so I can see 
what else is happening more easily.  That's all.

Thanks,
Paul Dubuc

--
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Suppress "Max concurrent service checks" messages.

2010-11-12 Thread Ton Voon

On 12 Nov 2010, at 15:30, Paul M. Dubuc wrote:

> We're running Nagios 3.2.3 with concurrent service checks set to  
> 40.  We can't
> go much higher than this due to resource constraints outside of  
> Nagios but
> we're running 329 services at 5 minute intervals (this is a "load  
> test" of
> sorts not production load ... yet).  Average execution time/latency  
> is 36/11
> seconds so we're seeing quite a few messages like this in the Nagios  
> log file:
>
> (Informational Message) [11-11-2010 14:55:57] Max concurrent service  
> checks
> (40) has been reached. Nudging : by 9 seconds...
>
> Is there any way to suppress these messages from being logged?  I  
> don't see an
> option for logging these in the config file documentation.

I put those messages in.

Firstly, 40 doesn't necessarily mean there are 40 concurrent service  
checks running as they may have finished but not been reaped yet (to  
decrement the counter).

Secondly, if you are getting these messages, then either (1) this  
limit is too low - increase and keep an eye of the load on your nagios  
server; (2) you've got too many checks running - reduce frequencies/ 
numbers or setup a slave server.

The trouble with the way the nudging works is that it hides the fact  
that you have latency issues (as the check is rescheduled to a future  
time). This means nagiostats will not include the additional latency  
time here.

If someone has a better way of working this out, I'm all ears.

Ton


--
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Suppress "Max concurrent service checks" messages.

2010-11-12 Thread Andreas Ericsson
On 11/12/2010 04:30 PM, Paul M. Dubuc wrote:
> We're running Nagios 3.2.3 with concurrent service checks set to 40.  We can't
> go much higher than this due to resource constraints outside of Nagios but
> we're running 329 services at 5 minute intervals (this is a "load test" of
> sorts not production load ... yet).  Average execution time/latency is 36/11
> seconds so we're seeing quite a few messages like this in the Nagios log file:
> 

If you're doing a "load test" on a system that clearly doesn't handle
production load and thus forces you to run with less than optimal settings,
you're doing things wrong.

> (Informational Message) [11-11-2010 14:55:57] Max concurrent service checks
> (40) has been reached. Nudging:  by 9 seconds...
> 
> Is there any way to suppress these messages from being logged?  I don't see an
> option for logging these in the config file documentation.
> 

Not really, no. See my previous comment though. It's equally valid now,
even though about 12 seconds have passed since I wrote it.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Suppress "Max concurrent service checks" messages.

2010-11-12 Thread Paul M. Dubuc
We're running Nagios 3.2.3 with concurrent service checks set to 40.  We can't 
go much higher than this due to resource constraints outside of Nagios but 
we're running 329 services at 5 minute intervals (this is a "load test" of 
sorts not production load ... yet).  Average execution time/latency is 36/11 
seconds so we're seeing quite a few messages like this in the Nagios log file:

(Informational Message) [11-11-2010 14:55:57] Max concurrent service checks 
(40) has been reached. Nudging : by 9 seconds...

Is there any way to suppress these messages from being logged?  I don't see an 
option for logging these in the config file documentation.

Thanks,
Paul Dubuc

--
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null