We have a Shinken installation using version 2.2. We have 5 servers. Each server has 40 CPU cores and 64GB of RAM. One server (shinken1) is running all of the daemons (arbiter, broker, scheduler, poller, receiver, reactionner). Three servers (shinken2, shinken3, shinken4) run only a scheduler and poller. The last server (shinken5) is a spare for all daemons.
Our current configuration (which is still being setup) has 2118 hosts and 19374 services and everything is running smoothly except that the Thruk interface seems a bit sluggish. However, when I change the normal_check_interval from 5 to 1 and the retry_check_interval from 2 to 1 (to match the current Nagios implementation) we start to have problems. The broker daemon starts to use more and more memory and CPU and then I start to see a lot of timeouts and configuration reassignments in the arbiter log. I've seen a broker process get up to 30GB of memory and 120% CPU at which point Shinken is pretty much unusable. Any idea what would cause this? Is it just that I need more capacity to run the extra checks? Would adding another poller and scheduler to shinken2-4 help? They seem to have plenty of CPU and RAM to spare still. ------------------------------------------------------------------------------ BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 standard Learn Process modeling best practices with Bonita BPM through live exercises http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF _______________________________________________ Shinken-devel mailing list Shinken-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/shinken-devel