We have a Shinken installation using version 2.2.  We have 5 servers. 
Each server has 40 CPU cores and 64GB of RAM.  One server (shinken1) is 
running all of the daemons (arbiter, broker, scheduler, poller, 
receiver, reactionner).  Three servers (shinken2, shinken3, shinken4) 
run only a scheduler and poller.  The last server (shinken5) is a spare 
for all daemons.

Our current configuration (which is still being setup) has 2118 hosts 
and 19374 services and everything is running smoothly except that the 
Thruk interface seems a bit sluggish.

However, when I change the normal_check_interval from 5 to 1 and the 
retry_check_interval from 2 to 1 (to match the current Nagios 
implementation) we start to have problems.  The broker daemon starts to 
use more and more memory and CPU and then I start to see a lot of 
timeouts and configuration reassignments in the arbiter log.  I've seen 
a broker process get up to 30GB of memory and 120% CPU at which point 
Shinken is pretty much unusable.

Any idea what would cause this?  Is it just that I need more capacity to 
run the extra checks?  Would adding another poller and scheduler to 
shinken2-4 help?  They seem to have plenty of CPU and RAM to spare still.

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel

Reply via email to