OK, things seem to be stable now. I discovered that several of the schedulers were using massive amounts of memory (over 30GB) causing the kernel to try to kill them or their children. I restarted them, then restarted anything that showed up as a problem in the arbiter log and since then it's been stable.
One odd thing though is that some of the daemons wouldn't die normally -- I had to use 'kill -KILL' on them. ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Shinken-devel mailing list Shinken-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/shinken-devel