We're still running 2.2 currently (we're on a tight schedule to get Shinken into production). We just revamped our host and service configuration, adding about 1000 hosts (currently 3351 hosts, 1839 hostgroups, 30404 services). Since loading that configuration we've had trouble with connections from the arbiter timing out so it's constantly redispatching the configuration only to have the daemon that timed out recover. I've increased ping timeouts to 6 seconds and data timeouts to 120 seconds but it still times out. I though maybe adding more schedulers and pollers would help as it would cut the config into smaller pieces that could be processed more quickly, but it's still constantly dispatching and re-dispatching configurations.
We're running on 5 servers, each with 40 CPUs and 64 GB of RAM. One of the servers is the master and runs all daemons plus an extra poller and scheduler. Three other servers are running two schedulers and two pollers each. The last is a spare setup the same as the master. None of the servers are showing significant CPU, I/O, Memory or network usage. Any ideas? Would upgrading to 2.4 help? ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Shinken-devel mailing list Shinken-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/shinken-devel