Excerpts from David Lang's message of 2014-12-18 19:03:54 +0100:
> On Wed, 17 Dec 2014, Marc Fournier wrote:
> 
> > Excerpts from David Lang's message of 2014-12-16 20:41:49 +0100:
> >> On Tue, 16 Dec 2014, Marc Fournier wrote:
> >>
> >>> Hello,
> >>>
> >>> openldap writes its logs using syslog(3). Three log messages are emitted
> >>> for each ldap search operation.
> >>>
> >>> On a given openldap setup, I reach about 2500 operations per second. The
> >>> simple fact of shutting rsyslog down icreases this number up to 5000
> >>> (logs are lost obviously).
> >>>
> >>> Writing to a local file, to /dev/null, or to a remote server over UDP
> >>> doesn't significantly change this number. I tried tweaking various
> >>> settings related to the main queue but didn't see any huge difference.
> >>>
> >>> ipmstats shows exactly the same numbers for "imuxsock: submitted" than
> >>> for "main Q: enqueued". All the discarded/full/failed/etc counters have
> >>> a value of 0.
> >>>
> >>> This is rsyslog-8.6.0-2.el6.x86_64 from the adiscon RPM repo (but had
> >>> similar problems with version 5.8.10-9.el6_6 which comes with el6).
> >>>
> >>> Here's the relevant bits of my config:
> >>>
> >>> module(load="impstats" log.syslog="off" log.file="/var/log/impstats.log" 
> >>> resetCounters="off" interval="5")
> >>> module(load="imuxsock" SysSock.RateLimit.Interval="0" 
> >>> SysSock.Annotate="off" SysSock.FlowControl="off" 
> >>> SysSock.IgnoreTimestamp="off")
> >>>
> >>> $MaxMessageSize 2k
> >>>
> >>> $MainMsgQueueDequeueBatchSize 1024
> >>> $MainMsgQueueType LinkedList
> >>> $MainMsgQueueSize 100000
> >>> $MainMsgQueueWorkerThreads 1
> >>>
> >>> $RepeatedMsgReduction off
> >>> $SystemLogRateLimitInterval 0
> >>>
> >>> $ActionQueueType LinkedList
> >>> $ActionQueueSize 100000
> >>> $ActionQueueDiscardMark 95000
> >>> $ActionQueueDiscardSeverity 0
> >>> $ActionQueueTimeoutEnqueue 0
> >>> $ActionQueueDequeueSlowdown 1000
> >>> $ActionQueueWorkerThreads 2
> >>> $ActionQueueDequeueBatchSize 128
> >>> $ActionResumeRetryCount -1
> >>>
> >>> #local4.*  -/var/log/ldap/ldap.log
> >>> #local4.*  ~
> >>> local4.*  -/dev/null
> >>> #local4.*  @localhost
> >>>
> >>> So I'm wondering which one of rsyslog or /dev/log is actually the
> >>> bottleneck here and what can I do about it ?
> >>
> >> /dev/log is just a socket to talk to rsyslog
> >>
> >> One major problem here, why do you have ActionQueueDequeueSlowdown 
> >> configured?
> >> that deliberatly pauses before trying to process the next messages.
> >>
> >> you really shouldn't need to have an action queue for writing to a local 
> >> file,
> >> you should remove all of the actionqueue parameters (and if you do decide 
> >> that
> >> you want a queue, use the new action() syntax, it makes it much clearer 
> >> what's
> >> happening)
> >
> > Ok, right my config is a mess I agree ! Been iterating over it since
> > back when I used 5.8.10.
> >
> > So I started over with a brand new config. The best results I got so far
> > was with these settings:
> >
> > module(load="impstats" log.syslog="off" log.file="/var/log/impstats.log" 
> > resetCounters="off" interval="5")
> > module(load="imuxsock" SysSock.RateLimit.Interval="0" 
> > SysSock.Annotate="off" SysSock.FlowControl="off" 
> > SysSock.IgnoreTimestamp="off")
> >
> > $MaxMessageSize 2k
> >
> > $MainMsgQueueDequeueBatchSize 4096
> > $MainMsgQueueType LinkedList
> > $MainMsgQueueSize 250000
> > $MainMsgQueueWorkerThreads 4
> > $MainMsgQueueWorkerThreadMinimumMessages 100000
> >
> > $RepeatedMsgReduction off
> > $SystemLogRateLimitInterval 0
> >
> > *.*  ~
> >
> > The numbers are slightly better now: ~3500 ops/s vs 2500 yesterday. But
> > still a significant difference with the max I reach when running with
> > rsyslogd stopped (~5000 ops/s).
> >
> > Is this an expected compromise ? Or are there possible improvements I
> > can do ?
> 
> I'm not seeing anything obviously wrong here, does impstats show anything 
> interesting (errors, etc)

No, no errors. The discard/full counters also stay at 0.

> I would actually reduce the worker threads down to 1 (it's counter-intuitive, 
> but having too many threads in rsyslog can actually slow things down because 
> of 
> the overhead involved with locking and unlocking the queues)

Ok, tried that. No significant change with the issue I'm facing though.

Meanwhile I improved my stress-test setup, and upgraded the target
machine to a quad-core (I was suspecting competition for cpu time
between openldap and rsyslog on the previous single-core machine).

The good news is that I can reproduce the problem much more reliably.
The bad news is that numbers are even worse and that I have proof that
the bottleneck lies in rsyslog. Here are my latest observations:

 - no rsyslog running: 15k to 17k ldap operations/second
 - rsyslog with above config, except for MainMsgQueueWorkerThreads back
   to 1: down to 7.5k ops/s
 - when monitoring openldap's threads state, a surge of threads in
   "pending" state as soon as rsyslog is started.
 - a trivial syslog-ng config, reading from /dev/log and sending
   everything to /dev/null, results in only about 10% ops/s decrease.
 - also the pending threads count stay close to zero when using
   syslog-ng.

To recap, the starting point of this issue was that openldap temporarily
stopped responding when a sudden surge of requests came in (ie: critical
service disruption for users). And I found out this problem didn't
happen when logging was disabled.

As the performance numbers we can read on the web about rsyslog are an
order of magnitude bigger *but* usually relate to the imtcp/imudp
modules, I'm wondering if there possibly was a problem with the imuxsock
module (FWIW "in:imuxsock" is constantly in the top 3 of "top -H" during
my tests) ?

Also no idea what value this has, but strace shows that rsyslog spends
significantly more time in system calls than syslog-ng, in my setup:

# timeout 10 strace -c -fp $(pidof rsyslogd)
Process 27446 attached with 4 threads - interrupt to quit
Process 27446 detached
Process 27447 detached
Process 27448 detached
Process 27449 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 94.85    2.981049          49     61079           select
  4.39    0.137935           0    287362     55920 futex
  0.68    0.021477           0     61077           recvmsg
  0.07    0.002301           0     61084           gettimeofday
  0.00    0.000000           0         6           writev
  0.00    0.000000           0         2           getrusage
------ ----------- ----------- --------- --------- ----------------
100.00    3.142762                470610     55920 total

# timeout 10 strace -c -fp $(pgrep -f ^syslog-ng)
Process 29477 attached - interrupt to quit
Process 29477 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 41.39    0.001228           0    359614           gettimeofday
 23.12    0.000686           0      1774         1 read
 15.67    0.000465           0    105768           lseek
 15.57    0.000462           0    105767           write
  4.25    0.000126           0     21153           poll
  0.00    0.000000           0       114           stat
  0.00    0.000000           0         1           restart_syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.002967                594191         1 total


Thanks a lot for your guidance so far !

Cheers,
Marc
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to