On Fri, 19 Dec 2014, Marc Fournier wrote:

Excerpts from David Lang's message of 2014-12-18 19:03:54 +0100:
On Wed, 17 Dec 2014, Marc Fournier wrote:

Excerpts from David Lang's message of 2014-12-16 20:41:49 +0100:
On Tue, 16 Dec 2014, Marc Fournier wrote:

Hello,

openldap writes its logs using syslog(3). Three log messages are emitted
for each ldap search operation.

On a given openldap setup, I reach about 2500 operations per second. The
simple fact of shutting rsyslog down icreases this number up to 5000
(logs are lost obviously).

Writing to a local file, to /dev/null, or to a remote server over UDP
doesn't significantly change this number. I tried tweaking various
settings related to the main queue but didn't see any huge difference.

ipmstats shows exactly the same numbers for "imuxsock: submitted" than
for "main Q: enqueued". All the discarded/full/failed/etc counters have
a value of 0.

This is rsyslog-8.6.0-2.el6.x86_64 from the adiscon RPM repo (but had
similar problems with version 5.8.10-9.el6_6 which comes with el6).


Ok, right my config is a mess I agree ! Been iterating over it since
back when I used 5.8.10.

So I started over with a brand new config. The best results I got so far
was with these settings:

module(load="impstats" log.syslog="off" log.file="/var/log/impstats.log" 
resetCounters="off" interval="5")
module(load="imuxsock" SysSock.RateLimit.Interval="0" SysSock.Annotate="off" 
SysSock.FlowControl="off" SysSock.IgnoreTimestamp="off")

$MaxMessageSize 2k

$MainMsgQueueDequeueBatchSize 4096
$MainMsgQueueType LinkedList
$MainMsgQueueSize 250000
$MainMsgQueueWorkerThreads 4
$MainMsgQueueWorkerThreadMinimumMessages 100000
$RepeatedMsgReduction off
$SystemLogRateLimitInterval 0

*.*  ~

The numbers are slightly better now: ~3500 ops/s vs 2500 yesterday. But
still a significant difference with the max I reach when running with
rsyslogd stopped (~5000 ops/s).

Is this an expected compromise ? Or are there possible improvements I
can do ?

I'm not seeing anything obviously wrong here, does impstats show anything
interesting (errors, etc)

No, no errors. The discard/full counters also stay at 0.

I would actually reduce the worker threads down to 1 (it's counter-intuitive,
but having too many threads in rsyslog can actually slow things down because of
the overhead involved with locking and unlocking the queues)

Ok, tried that. No significant change with the issue I'm facing though.

Meanwhile I improved my stress-test setup, and upgraded the target
machine to a quad-core (I was suspecting competition for cpu time
between openldap and rsyslog on the previous single-core machine).

The good news is that I can reproduce the problem much more reliably.
The bad news is that numbers are even worse and that I have proof that
the bottleneck lies in rsyslog. Here are my latest observations:

- no rsyslog running: 15k to 17k ldap operations/second
- rsyslog with above config, except for MainMsgQueueWorkerThreads back
  to 1: down to 7.5k ops/s

wihtout rsyslog running, there shouldn't even be a /dev/log to write to, so any logging should be a noop.

are you running systemd by any chance?

let's simplify the config more, try making it just

module(load="impstats" log.syslog="off" log.file="/var/log/impstats.log" 
resetCounters="off" interval="5")
module(load="imuxsock" SysSock.RateLimit.Interval="0")
$MaxMessageSize 2k
$MainMsgQueueDequeueBatchSize 4096
$RepeatedMsgReduction off

*.*  ~

The next step I would try is to change the parser settings to avoid wasting effort trying to parse the message, but I would be a bit surprised if that was the problem here.

The fact that rsyslog is spending so much time in select puzzles me (the rest makes sense).

Rainer is going to need to take a look at this, but we are about to loose him for Christmas vacation (if we haven't already)

David Lang

- when monitoring openldap's threads state, a surge of threads in
  "pending" state as soon as rsyslog is started.
- a trivial syslog-ng config, reading from /dev/log and sending
  everything to /dev/null, results in only about 10% ops/s decrease.
- also the pending threads count stay close to zero when using
  syslog-ng.

To recap, the starting point of this issue was that openldap temporarily
stopped responding when a sudden surge of requests came in (ie: critical
service disruption for users). And I found out this problem didn't
happen when logging was disabled.

As the performance numbers we can read on the web about rsyslog are an
order of magnitude bigger *but* usually relate to the imtcp/imudp
modules, I'm wondering if there possibly was a problem with the imuxsock
module (FWIW "in:imuxsock" is constantly in the top 3 of "top -H" during
my tests) ?

Also no idea what value this has, but strace shows that rsyslog spends
significantly more time in system calls than syslog-ng, in my setup:

# timeout 10 strace -c -fp $(pidof rsyslogd)
Process 27446 attached with 4 threads - interrupt to quit
Process 27446 detached
Process 27447 detached
Process 27448 detached
Process 27449 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
94.85    2.981049          49     61079           select
 4.39    0.137935           0    287362     55920 futex
 0.68    0.021477           0     61077           recvmsg
 0.07    0.002301           0     61084           gettimeofday
 0.00    0.000000           0         6           writev
 0.00    0.000000           0         2           getrusage
------ ----------- ----------- --------- --------- ----------------
100.00    3.142762                470610     55920 total

# timeout 10 strace -c -fp $(pgrep -f ^syslog-ng)
Process 29477 attached - interrupt to quit
Process 29477 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
41.39    0.001228           0    359614           gettimeofday
23.12    0.000686           0      1774         1 read
15.67    0.000465           0    105768           lseek
15.57    0.000462           0    105767           write
 4.25    0.000126           0     21153           poll
 0.00    0.000000           0       114           stat
 0.00    0.000000           0         1           restart_syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.002967                594191         1 total


Thanks a lot for your guidance so far !

Cheers,
Marc
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to