Sorry, David, I'm out of ideas.
Only difference from your config to mine is the use of modules on the
poller: I don't use any.


Have you tried upgrading to 2.4 just in case? Or maybe downgrading to 2.0.3
(that's the version I am).

Regards

On 14 May 2015 at 23:24, David Good <dg...@willingminds.com> wrote:

>  Here's the poller.ini file I'm using:
>
> [daemon]
>
> #-- Global Configuration
> #user=shinken         ; if not set then by default it's the current user.
> #group=shinken        ; if not set then by default it's the current group.
> # Set to 0 if you want to make this daemon NOT run
> daemon_enabled=1
>
> # Larger configurations need more threads (default is 8?)
> daemon_thread_pool_size=50
>
> #-- Path Configuration
> # The daemon will chdir into the directory workdir when launched
> # paths variables values, if not absolute paths, are relative to workdir.
> # using default values for following config variables value:
> workdir = /var/run/shinken
> logdir  = /var/log/shinken
> pidfile=%(workdir)s/pollerd.pid
>
> #-- Network configuration
> # host=0.0.0.0
> # port=7771
> # http_backend=auto
> # idontcareaboutsecurity=0
>
> #-- SSL configuration --
> use_ssl=0
> # WARNING : Put full paths for certs
> #ca_cert=/etc/shinken/certs/ca.pem
> #server_cert=/etc/shinken/certs/server.cert
> #server_key=/etc/shinken/certs/server.key
> #hard_ssl_name_check=0
>
> #-- Local log management --
> # Enabled by default to ease troubleshooting
> use_local_log=1
> local_log=%(logdir)s/pollerd.log
> # accepted log level values= DEBUG,INFO,WARNING,ERROR,CRITICAL
> log_level=INFO
> #log_level=DEBUG
>
>  And here's the poller.cfg file:
>
>
> #===============================================================================
> # POLLER (S1_Poller)
>
> #===============================================================================
> # Description: The poller is responsible for:
> # - Active data acquisition
> # - Local passive data acquisition
> # https://shinken.readthedocs.org/en/latest/08_configobjects/poller.html
>
> #===============================================================================
> define poller {
>     poller_name     poller-1
>     address         shinken1.dc1.example.com
>     port            7771
>
>     ## Optional
>     spare               0   ; 1 = is a spare, 0 = is not a spare
>     manage_sub_realms   0   ; Does it take jobs from schedulers of
> sub-Realms?
>     min_workers         0   ; Starts with N processes (0 = 1 per CPU)
>     max_workers         0   ; No more than N processes (0 = 1 per CPU)
>     processes_by_worker 256 ; Each worker manages N checks
>     polling_interval    1   ; Get jobs from schedulers each N seconds
>     timeout             3  ; Ping timeout
>     data_timeout        120 ; Data send timeout
>     max_check_attempts  3   ; If ping fails N or more, then the node is
> dead
>     check_interval      60  ; Ping node every N seconds
>
>     ## Interesting modules that can be used:
>     # - booster-nrpe     = Replaces the check_nrpe binary. Therefore it
>     #                     enhances performances when there are lot of NRPE
>     #                     calls.
>     # - named-pipe     = Allow the poller to read a nagios.cmd named pipe.
>     #                     This permits the use of distributed check_mk
> checks
>     #                     should you desire it.
>     # - SnmpBooster     = Snmp bulk polling module
>     modules     named-pipe, booster-nrpe
>
>     ## Advanced Features
>     #passive         0       ; For DMZ monitoring, set to 1 so the
> connections
>                              ; will be from scheduler -> poller.
>
>     # Poller tags are the tag that the poller will manage. Use None as tag
> name to manage
>     # untaggued checks
>     #poller_tags     None
>
>     # Enable https or not
>     use_ssl              0
>     # enable certificate/hostname check, will avoid man in the middle
> attacks
>     hard_ssl_name_check   0
>
>
>     realm   All
> }
>
>
> On 5/14/15 3:13 PM, David Good wrote:
>
> Here's another example of what I'm seeing -- In the arbiter log I'll see
> something like this:
>
> [1431641122] INFO: [Shinken] [All] Trying to send configuration to poller
> poller-1
> [1431641242] ERROR: [Shinken] Failed sending configuration for poller-1:
> Connexion error to http://shinken1.dc1.example.com:7771/
> <http://shinken1.dc1.eharmony.com:7771/> : Operation timed out after
> 120001 milliseconds with 0 bytes received
>
>
> And then just a few seconds later:
>
> [1431641291] INFO: [Shinken] [All] Trying to send configuration to poller
> poller-1
> [1431641291] INFO: [Shinken] [All] Dispatch OK of configuration 1 to
> poller poller-1
>
> And this poller is on the same server as the arbiter.  I see this
> happening sporadically for pretty much every daemon, causing the
> configuration to be constantly in the process of being re-dispatched.  This
> is especially frustrating as I'm trying to test out some new configs adding
> and removing hosts and services from monitoring.  If it can't finish
> dispatching it makes it hard to test :-/
>
> On 5/14/15 2:49 PM, David Good wrote:
>
>
> I doubt that was the case -- I was careful to make sure everything was
> stopped before restarting.
>
> And now my problems have started up again.  I may be forced to upgrade to
> 2.4 to see if it helps any.  Very frustrating.  If that doesn't fix it, I
> may be forced to fall back to nagios and gearman.  It'd hate to do that as
> we had promised that Shinken would scale better than Nagios.
>
> On 5/13/15 2:50 PM, Felipe openglx wrote:
>
>   Play the lotto just in case ;)
>  My suspicion would be that your previous "restart" to adjust the thread
> pool (or other testing) didn't kill all threads, hence why you had some
> very unusual situations going on.
>  Let us know how it goes, best luck on getting the project delivered!
>
>  Regards
>
>
> On 13 May 2015 at 22:18, David Good <dg...@willingminds.com> wrote:
>
>>  It was all hosts, but I just reloaded with a new config, so we'll see if
>> my luck holds :-)
>>
>>
>> On 5/13/15 2:00 PM, Felipe openglx wrote:
>>
>>  I've noticed that Shinken 2 doesn't go easily with kill. I've always
>> done "pkill -9 -f shinken-" when needing to restart them.
>>
>> Glad to hear you got something working, David. All hosts or just a
>> fraction of them?
>>
>>  Regards
>>
>> On 13 May 2015 at 21:43, David Good <dg...@willingminds.com> wrote:
>>
>>>
>>>
>>> OK, things seem to be stable now.  I discovered that several of the
>>> schedulers were using massive amounts of memory (over 30GB) causing the
>>> kernel to try to kill them or their children.  I restarted them, then
>>> restarted anything that showed up as a problem in the arbiter log and
>>> since then it's been stable.
>>>
>>> One odd thing though is that some of the daemons wouldn't die normally
>>> -- I had to use 'kill -KILL' on them.
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> One dashboard for servers and applications across Physical-Virtual-Cloud
>>> Widest out-of-the-box monitoring support with 50+ applications
>>> Performance metrics, stats and reports that give you Actionable Insights
>>> Deep dive visibility with transaction tracing using APM Insight.
>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> _______________________________________________
>>> Shinken-devel mailing list
>>> Shinken-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/shinken-devel
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM 
>> Insight.http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>
>>
>>
>> _______________________________________________
>> Shinken-devel mailing 
>> listShinken-devel@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/shinken-devel
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> _______________________________________________
>> Shinken-devel mailing list
>> Shinken-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/shinken-devel
>>
>>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM 
> Insight.http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>
>
>
> _______________________________________________
> Shinken-devel mailing 
> listShinken-devel@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/shinken-devel
>
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM 
> Insight.http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>
>
>
> _______________________________________________
> Shinken-devel mailing 
> listShinken-devel@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/shinken-devel
>
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM 
> Insight.http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>
>
>
> _______________________________________________
> Shinken-devel mailing 
> listShinken-devel@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/shinken-devel
>
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Shinken-devel mailing list
> Shinken-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/shinken-devel
>
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel

Reply via email to