A 10x increase is very extreme. Can you send the output of 
/usr/local/nagios/var/log/create_and_send_configs.debug?

I attached it as it is a bit too large to post here.

This debug tells me it is not the scp that is taking the most time - the 
longest scp is to node9 which took 1.14 seconds. So I don't think an rsync is 
the way to go.

It looks like it is the verification step which is the main time user - 
node5_verify took 12 minutes. So Nagios is taking a long time validating the 
configurations (which explains your 100% nagios cpu load).

Did you make other changes? Maybe adding service check dependencies for NRPE?

No changes made, and the dependencies count up to 20 for that slave server. But 
those are not NRPE related.
In total there are 145 service dependencies.

I had a quick look at the contacts.cfg file, and it shows me a possible source 
of the problem: large amount of host groups (163) and service groups (143) 
associated with various contacts.
I have many groups that are eg about Servers, but for every site I create a 
separate group. So I have 9 hostgroups for Servers, 9 for Switches, etc.

In the meantime, it is possible to increase the number of parallel jobs which 
may reduce your reload time. I've added some documentation at:
  * http://docs.opsview.com/doku.php?id=opsview-community:configuration_files

OK, I raised it to 16 (4 per CPU) and now the reload is brought down to 10 
minutes.
I also removed some contacts, hosts and services that are no longer required.

Attached the latest create_and_send_configs.debug file. Still a long verify for 
some nodes ...

Toni

Attachment: create_and_send_configs.debug
Description: create_and_send_configs.debug

_______________________________________________
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users

Reply via email to