On 21 May 2010, at 18:03, Andrew Hall wrote:

We keep seeing a lot of stale data in the master GUI for services
checked by a remote slave.

When I check the nagios log on this slave I see lots of these...

"Max concurrent service checks (50) has been reached.  Delaying
further checks until previous checks are complete..."

I've been through the hosts and can't spot anything which seems amiss.

It's monitoring 47 hosts and 510 services, and none of those service
checks has a check interval with a frequency below 5 minutes.

Can anyone advise how I could begin to troubleshoot this ?

This was a bug in Nagios which we saw around Opsview 3.1. This has been pushed back upstream into Nagios already - Nagios 3.2.0 from memory.

It looks like we've applied the patch to the 3.0 branch, but as we're not maintaining that anymore, that's not likely to get released. To be honest, we're moving at a cracking pace to keep adding features that people want to see in Opsview and we can't afford to maintain older versions. However, if you take out a subscription with us, then we can make some exceptions :)

Or - if the box isn't particularly overloaded - how I could increase
this value ?

The specific bug is that when max_concurrent_checks is reached, Nagios just schedules everything again at the same time at the next check interval. Our fix to Nagios was to push the check a random number of seconds ahead, to get more spread in the timings. As your Nagios doesn't have this extra logic, you can just disable the max_concurrent_checks by setting it to 0.

http://docs.opsview.org/doku.php?id=opsview-community:configuration_files#overrides

Ton

_______________________________________________
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users

Reply via email to