Here's another example of what I'm seeing -- In the arbiter log I'll see something like this:

[1431641122] INFO: [Shinken] [All] Trying to send configuration to poller poller-1
[1431641242] ERROR: [Shinken] Failed sending configuration for poller-1: Connexion error to http://shinken1.dc1.eharmony.com:7771/ : Operation timed out after 120001 milliseconds with 0 bytes received

And then just a few seconds later:

[1431641291] INFO: [Shinken] [All] Trying to send configuration to poller poller-1
[1431641291] INFO: [Shinken] [All] Dispatch OK of configuration 1 to poller poller-1

And this poller is on the same server as the arbiter.  I see this happening sporadically for pretty much every daemon, causing the configuration to be constantly in the process of being re-dispatched.  This is especially frustrating as I'm trying to test out some new configs adding and removing hosts and services from monitoring.  If it can't finish dispatching it makes it hard to test :-/

On 5/14/15 2:49 PM, David Good wrote:

I doubt that was the case -- I was careful to make sure everything was stopped before restarting.

And now my problems have started up again.  I may be forced to upgrade to 2.4 to see if it helps any.  Very frustrating.  If that doesn't fix it, I may be forced to fall back to nagios and gearman.  It'd hate to do that as we had promised that Shinken would scale better than Nagios.

On 5/13/15 2:50 PM, Felipe openglx wrote:
Play the lotto just in case ;)
My suspicion would be that your previous "restart" to adjust the thread pool (or other testing) didn't kill all threads, hence why you had some very unusual situations going on.
Let us know how it goes, best luck on getting the project delivered!

Regards


On 13 May 2015 at 22:18, David Good <dg...@willingminds.com> wrote:
It was all hosts, but I just reloaded with a new config, so we'll see if my luck holds :-)


On 5/13/15 2:00 PM, Felipe openglx wrote:
I've noticed that Shinken 2 doesn't go easily with kill. I've always done "pkill -9 -f shinken-" when needing to restart them.

Glad to hear you got something working, David. All hosts or just a fraction of them?

Regards

On 13 May 2015 at 21:43, David Good <dg...@willingminds.com> wrote:


OK, things seem to be stable now.  I discovered that several of the
schedulers were using massive amounts of memory (over 30GB) causing the
kernel to try to kill them or their children.  I restarted them, then
restarted anything that showed up as a problem in the arbiter log and
since then it's been stable.

One odd thing though is that some of the daemons wouldn't die normally
-- I had to use 'kill -KILL' on them.


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel



------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y


_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel




------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y


_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel



------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y


_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel



------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel

Reply via email to