Sorry, David, I'm out of ideas.
Only difference from your config to mine is the use of modules on the
poller: I don't use any.
Have you tried upgrading to 2.4 just in case? Or maybe downgrading to 2.0.3
(that's the version I am).
Regards
On 14 May 2015 at 23:24, David Good <dg...@willingminds.com> wrote:
> Here's the poller.ini file I'm using:
>
> [daemon]
>
> #-- Global Configuration
> #user=shinken ; if not set then by default it's the current user.
> #group=shinken ; if not set then by default it's the current group.
> # Set to 0 if you want to make this daemon NOT run
> daemon_enabled=1
>
> # Larger configurations need more threads (default is 8?)
> daemon_thread_pool_size=50
>
> #-- Path Configuration
> # The daemon will chdir into the directory workdir when launched
> # paths variables values, if not absolute paths, are relative to workdir.
> # using default values for following config variables value:
> workdir = /var/run/shinken
> logdir = /var/log/shinken
> pidfile=%(workdir)s/pollerd.pid
>
> #-- Network configuration
> # host=0.0.0.0
> # port=7771
> # http_backend=auto
> # idontcareaboutsecurity=0
>
> #-- SSL configuration --
> use_ssl=0
> # WARNING : Put full paths for certs
> #ca_cert=/etc/shinken/certs/ca.pem
> #server_cert=/etc/shinken/certs/server.cert
> #server_key=/etc/shinken/certs/server.key
> #hard_ssl_name_check=0
>
> #-- Local log management --
> # Enabled by default to ease troubleshooting
> use_local_log=1
> local_log=%(logdir)s/pollerd.log
> # accepted log level values= DEBUG,INFO,WARNING,ERROR,CRITICAL
> log_level=INFO
> #log_level=DEBUG
>
> And here's the poller.cfg file:
>
>
> #===============================================================================
> # POLLER (S1_Poller)
>
> #===============================================================================
> # Description: The poller is responsible for:
> # - Active data acquisition
> # - Local passive data acquisition
> # https://shinken.readthedocs.org/en/latest/08_configobjects/poller.html
>
> #===============================================================================
> define poller {
> poller_name poller-1
> address shinken1.dc1.example.com
> port 7771
>
> ## Optional
> spare 0 ; 1 = is a spare, 0 = is not a spare
> manage_sub_realms 0 ; Does it take jobs from schedulers of
> sub-Realms?
> min_workers 0 ; Starts with N processes (0 = 1 per CPU)
> max_workers 0 ; No more than N processes (0 = 1 per CPU)
> processes_by_worker 256 ; Each worker manages N checks
> polling_interval 1 ; Get jobs from schedulers each N seconds
> timeout 3 ; Ping timeout
> data_timeout 120 ; Data send timeout
> max_check_attempts 3 ; If ping fails N or more, then the node is
> dead
> check_interval 60 ; Ping node every N seconds
>
> ## Interesting modules that can be used:
> # - booster-nrpe = Replaces the check_nrpe binary. Therefore it
> # enhances performances when there are lot of NRPE
> # calls.
> # - named-pipe = Allow the poller to read a nagios.cmd named pipe.
> # This permits the use of distributed check_mk
> checks
> # should you desire it.
> # - SnmpBooster = Snmp bulk polling module
> modules named-pipe, booster-nrpe
>
> ## Advanced Features
> #passive 0 ; For DMZ monitoring, set to 1 so the
> connections
> ; will be from scheduler -> poller.
>
> # Poller tags are the tag that the poller will manage. Use None as tag
> name to manage
> # untaggued checks
> #poller_tags None
>
> # Enable https or not
> use_ssl 0
> # enable certificate/hostname check, will avoid man in the middle
> attacks
> hard_ssl_name_check 0
>
>
> realm All
> }
>
>
> On 5/14/15 3:13 PM, David Good wrote:
>
> Here's another example of what I'm seeing -- In the arbiter log I'll see
> something like this:
>
> [1431641122] INFO: [Shinken] [All] Trying to send configuration to poller
> poller-1
> [1431641242] ERROR: [Shinken] Failed sending configuration for poller-1:
> Connexion error to http://shinken1.dc1.example.com:7771/
> <http://shinken1.dc1.eharmony.com:7771/> : Operation timed out after
> 120001 milliseconds with 0 bytes received
>
>
> And then just a few seconds later:
>
> [1431641291] INFO: [Shinken] [All] Trying to send configuration to poller
> poller-1
> [1431641291] INFO: [Shinken] [All] Dispatch OK of configuration 1 to
> poller poller-1
>
> And this poller is on the same server as the arbiter. I see this
> happening sporadically for pretty much every daemon, causing the
> configuration to be constantly in the process of being re-dispatched. This
> is especially frustrating as I'm trying to test out some new configs adding
> and removing hosts and services from monitoring. If it can't finish
> dispatching it makes it hard to test :-/
>
> On 5/14/15 2:49 PM, David Good wrote:
>
>
> I doubt that was the case -- I was careful to make sure everything was
> stopped before restarting.
>
> And now my problems have started up again. I may be forced to upgrade to
> 2.4 to see if it helps any. Very frustrating. If that doesn't fix it, I
> may be forced to fall back to nagios and gearman. It'd hate to do that as
> we had promised that Shinken would scale better than Nagios.
>
> On 5/13/15 2:50 PM, Felipe openglx wrote:
>
> Play the lotto just in case ;)
> My suspicion would be that your previous "restart" to adjust the thread
> pool (or other testing) didn't kill all threads, hence why you had some
> very unusual situations going on.
> Let us know how it goes, best luck on getting the project delivered!
>
> Regards
>
>
> On 13 May 2015 at 22:18, David Good <dg...@willingminds.com> wrote:
>
>> It was all hosts, but I just reloaded with a new config, so we'll see if
>> my luck holds :-)
>>
>>
>> On 5/13/15 2:00 PM, Felipe openglx wrote:
>>
>> I've noticed that Shinken 2 doesn't go easily with kill. I've always
>> done "pkill -9 -f shinken-" when needing to restart them.
>>
>> Glad to hear you got something working, David. All hosts or just a
>> fraction of them?
>>
>> Regards
>>
>> On 13 May 2015 at 21:43, David Good <dg...@willingminds.com> wrote:
>>
>>>
>>>
>>> OK, things seem to be stable now. I discovered that several of the
>>> schedulers were using massive amounts of memory (over 30GB) causing the
>>> kernel to try to kill them or their children. I restarted them, then
>>> restarted anything that showed up as a problem in the arbiter log and
>>> since then it's been stable.
>>>
>>> One odd thing though is that some of the daemons wouldn't die normally
>>> -- I had to use 'kill -KILL' on them.
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> One dashboard for servers and applications across Physical-Virtual-Cloud
>>> Widest out-of-the-box monitoring support with 50+ applications
>>> Performance metrics, stats and reports that give you Actionable Insights
>>> Deep dive visibility with transaction tracing using APM Insight.
>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> _______________________________________________
>>> Shinken-devel mailing list
>>> Shinken-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/shinken-devel
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM
>> Insight.http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>
>>
>>
>> _______________________________________________
>> Shinken-devel mailing
>> listShinken-devel@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/shinken-devel
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> _______________________________________________
>> Shinken-devel mailing list
>> Shinken-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/shinken-devel
>>
>>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM
> Insight.http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>
>
>
> _______________________________________________
> Shinken-devel mailing
> listShinken-devel@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/shinken-devel
>
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM
> Insight.http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>
>
>
> _______________________________________________
> Shinken-devel mailing
> listShinken-devel@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/shinken-devel
>
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM
> Insight.http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>
>
>
> _______________________________________________
> Shinken-devel mailing
> listShinken-devel@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/shinken-devel
>
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Shinken-devel mailing list
> Shinken-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/shinken-devel
>
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel