Excerpts from David Lang's message of 2014-12-18 19:03:54 +0100: > On Wed, 17 Dec 2014, Marc Fournier wrote: > > > Excerpts from David Lang's message of 2014-12-16 20:41:49 +0100: > >> On Tue, 16 Dec 2014, Marc Fournier wrote: > >> > >>> Hello, > >>> > >>> openldap writes its logs using syslog(3). Three log messages are emitted > >>> for each ldap search operation. > >>> > >>> On a given openldap setup, I reach about 2500 operations per second. The > >>> simple fact of shutting rsyslog down icreases this number up to 5000 > >>> (logs are lost obviously). > >>> > >>> Writing to a local file, to /dev/null, or to a remote server over UDP > >>> doesn't significantly change this number. I tried tweaking various > >>> settings related to the main queue but didn't see any huge difference. > >>> > >>> ipmstats shows exactly the same numbers for "imuxsock: submitted" than > >>> for "main Q: enqueued". All the discarded/full/failed/etc counters have > >>> a value of 0. > >>> > >>> This is rsyslog-8.6.0-2.el6.x86_64 from the adiscon RPM repo (but had > >>> similar problems with version 5.8.10-9.el6_6 which comes with el6). > >>> > >>> Here's the relevant bits of my config: > >>> > >>> module(load="impstats" log.syslog="off" log.file="/var/log/impstats.log" > >>> resetCounters="off" interval="5") > >>> module(load="imuxsock" SysSock.RateLimit.Interval="0" > >>> SysSock.Annotate="off" SysSock.FlowControl="off" > >>> SysSock.IgnoreTimestamp="off") > >>> > >>> $MaxMessageSize 2k > >>> > >>> $MainMsgQueueDequeueBatchSize 1024 > >>> $MainMsgQueueType LinkedList > >>> $MainMsgQueueSize 100000 > >>> $MainMsgQueueWorkerThreads 1 > >>> > >>> $RepeatedMsgReduction off > >>> $SystemLogRateLimitInterval 0 > >>> > >>> $ActionQueueType LinkedList > >>> $ActionQueueSize 100000 > >>> $ActionQueueDiscardMark 95000 > >>> $ActionQueueDiscardSeverity 0 > >>> $ActionQueueTimeoutEnqueue 0 > >>> $ActionQueueDequeueSlowdown 1000 > >>> $ActionQueueWorkerThreads 2 > >>> $ActionQueueDequeueBatchSize 128 > >>> $ActionResumeRetryCount -1 > >>> > >>> #local4.* -/var/log/ldap/ldap.log > >>> #local4.* ~ > >>> local4.* -/dev/null > >>> #local4.* @localhost > >>> > >>> So I'm wondering which one of rsyslog or /dev/log is actually the > >>> bottleneck here and what can I do about it ? > >> > >> /dev/log is just a socket to talk to rsyslog > >> > >> One major problem here, why do you have ActionQueueDequeueSlowdown > >> configured? > >> that deliberatly pauses before trying to process the next messages. > >> > >> you really shouldn't need to have an action queue for writing to a local > >> file, > >> you should remove all of the actionqueue parameters (and if you do decide > >> that > >> you want a queue, use the new action() syntax, it makes it much clearer > >> what's > >> happening) > > > > Ok, right my config is a mess I agree ! Been iterating over it since > > back when I used 5.8.10. > > > > So I started over with a brand new config. The best results I got so far > > was with these settings: > > > > module(load="impstats" log.syslog="off" log.file="/var/log/impstats.log" > > resetCounters="off" interval="5") > > module(load="imuxsock" SysSock.RateLimit.Interval="0" > > SysSock.Annotate="off" SysSock.FlowControl="off" > > SysSock.IgnoreTimestamp="off") > > > > $MaxMessageSize 2k > > > > $MainMsgQueueDequeueBatchSize 4096 > > $MainMsgQueueType LinkedList > > $MainMsgQueueSize 250000 > > $MainMsgQueueWorkerThreads 4 > > $MainMsgQueueWorkerThreadMinimumMessages 100000 > > > > $RepeatedMsgReduction off > > $SystemLogRateLimitInterval 0 > > > > *.* ~ > > > > The numbers are slightly better now: ~3500 ops/s vs 2500 yesterday. But > > still a significant difference with the max I reach when running with > > rsyslogd stopped (~5000 ops/s). > > > > Is this an expected compromise ? Or are there possible improvements I > > can do ? > > I'm not seeing anything obviously wrong here, does impstats show anything > interesting (errors, etc)
No, no errors. The discard/full counters also stay at 0. > I would actually reduce the worker threads down to 1 (it's counter-intuitive, > but having too many threads in rsyslog can actually slow things down because > of > the overhead involved with locking and unlocking the queues) Ok, tried that. No significant change with the issue I'm facing though. Meanwhile I improved my stress-test setup, and upgraded the target machine to a quad-core (I was suspecting competition for cpu time between openldap and rsyslog on the previous single-core machine). The good news is that I can reproduce the problem much more reliably. The bad news is that numbers are even worse and that I have proof that the bottleneck lies in rsyslog. Here are my latest observations: - no rsyslog running: 15k to 17k ldap operations/second - rsyslog with above config, except for MainMsgQueueWorkerThreads back to 1: down to 7.5k ops/s - when monitoring openldap's threads state, a surge of threads in "pending" state as soon as rsyslog is started. - a trivial syslog-ng config, reading from /dev/log and sending everything to /dev/null, results in only about 10% ops/s decrease. - also the pending threads count stay close to zero when using syslog-ng. To recap, the starting point of this issue was that openldap temporarily stopped responding when a sudden surge of requests came in (ie: critical service disruption for users). And I found out this problem didn't happen when logging was disabled. As the performance numbers we can read on the web about rsyslog are an order of magnitude bigger *but* usually relate to the imtcp/imudp modules, I'm wondering if there possibly was a problem with the imuxsock module (FWIW "in:imuxsock" is constantly in the top 3 of "top -H" during my tests) ? Also no idea what value this has, but strace shows that rsyslog spends significantly more time in system calls than syslog-ng, in my setup: # timeout 10 strace -c -fp $(pidof rsyslogd) Process 27446 attached with 4 threads - interrupt to quit Process 27446 detached Process 27447 detached Process 27448 detached Process 27449 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 94.85 2.981049 49 61079 select 4.39 0.137935 0 287362 55920 futex 0.68 0.021477 0 61077 recvmsg 0.07 0.002301 0 61084 gettimeofday 0.00 0.000000 0 6 writev 0.00 0.000000 0 2 getrusage ------ ----------- ----------- --------- --------- ---------------- 100.00 3.142762 470610 55920 total # timeout 10 strace -c -fp $(pgrep -f ^syslog-ng) Process 29477 attached - interrupt to quit Process 29477 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 41.39 0.001228 0 359614 gettimeofday 23.12 0.000686 0 1774 1 read 15.67 0.000465 0 105768 lseek 15.57 0.000462 0 105767 write 4.25 0.000126 0 21153 poll 0.00 0.000000 0 114 stat 0.00 0.000000 0 1 restart_syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.002967 594191 1 total Thanks a lot for your guidance so far ! Cheers, Marc _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

