We're still running 2.2 currently (we're on a tight schedule to get 
Shinken into production).  We just revamped our host and service 
configuration, adding about 1000 hosts (currently 3351 hosts, 1839 
hostgroups, 30404 services).  Since loading that configuration we've had 
trouble with connections from the arbiter timing out so it's constantly 
redispatching the configuration only to have the daemon that timed out 
recover.  I've increased ping timeouts to 6 seconds and data timeouts to 
120 seconds but it still times out.  I though maybe adding more 
schedulers and pollers would help as it would cut the config into 
smaller pieces that could be processed more quickly, but it's still 
constantly dispatching and re-dispatching configurations.

We're running on 5 servers, each with 40 CPUs and 64 GB of RAM.  One of 
the servers is the master and runs all daemons plus an extra poller and 
scheduler.  Three other servers are running two schedulers and two 
pollers each.  The last is a spare setup the same as the master.  None 
of the servers are showing significant CPU, I/O, Memory or network usage.

Any ideas?  Would upgrading to 2.4 help?

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel

Reply via email to