On Fri, 18 Mar 2016, Rainer Gerhards wrote:

2016-03-17 22:15 GMT+01:00 David Lang <[email protected]>:
I have been running 8.17 from the repo combined with a copy of liblognorm
2.0.0 that I compiled during the 8.16 timeframe. Since the upgrade to 8.17
I've been getting a few coredumps. Since I enabled async writing to
dynamically generated files, rsyslog is using 1500+ threads (up from ~17
threads prior to this change) and has been running a 32G box OOM repeatedly.

Just a side-note: async writing is very ressource intense. Among
other, it requires one background thread per file. It should be turned
on only if there is sufficient slow processing, e.g zip writing with
larger io buffers. If io buffers are small, there is also probably
lots of locking contention between forground action processing and
background writer.

In this case I am doing a zip -9 of the files. I only tried doing this because the worker for this queue was showing as the highest CPU utilization in top. for now, I split it into two queues, one per action, and combined with async disabled, I'm back to almost the speed I was getting with the 8.17 pre.


Figuring that there is a reasonable chance that these problems are due to
the mixing of versions, I compiled from today's git tree and deployed that
to a server that's receiving a flood of logs from queues that are flushing
to it. When I deployed the new version, the throughput dropped noticably
(~30% drop from handling ~300K messages/min to handling 200K messages/min)

One thing that triggers my mind is the change of hash function. In
8.17, we use a different hash function inside libfastjson. This has
prooven to be much faster in experiments we carried out, but maybe
it's not the case for you. You can comment out line 1640

https://github.com/rsyslog/rsyslog/blob/master/tools/rsyslogd.c#L1640

here and see what happens.

I'll try to give that a try, but may not get a chance before monday.

By the way, the libfastjson version number is still 0.9.2 in the git tree, you missed bumping it to 0.9.3 after the release.

I need to go through the rest of the thread in more detail. However, I
have reviewed the ChangeLog and neither it nor my memory points into
any changes in 8.17 that would explain what you see (also refering to
your later mail). So we probably need to track things down. A good
start would be if you could run tests on a dedicated test system.
Running under valgrind to see the potential leak and doing a git
bisect to find a potential culprit sounds like good next steps to me.

well, right now, this box is fair game. the old version got corrupted and so there is almost 2 months of logs to replay into it from archives of queues. there is a duplicate of it that has been up handling the same traffic all along.

so to summarize

running pre 8.17 with a pre 8.17 liblognorm was working with possibly some rare coredumps. async was turned on during this time.

upgrading to 8.17 (but keeping the old liblognorm, with a symlink ln -s liblognorm.so.4 liblognorm.so.2) with no config changes caused multiple coredumps, and a slow memory leak

upgrading to current git solved the coredumps, but the memory leak was killing rsyslog about hourly. But there was a significant performance hit.

turning off async solved the memory leak

I just saw the box reverted to the libfastjson that is in the adiscon repo rather than my compiled copy from git, so when I restart things in about a half hour, it will be running against that.

I'm about to turn in for the night, but tomorrow I'll look over any test scenarios and see about trying to test them over the weekend. or early next week. I do have rather extensive stats gathering on this system, so if we want to look back in time at what was happening, I can probably do so.

David Lang

Rainer

This is with no config changes, just changing the binary packages.

This is with a rather complex ruleset (11 queues, 75+ actions, at least 4
mmnormalize calls, one with a 1500 line ruleset)

Since pushing the new version to this machine, no coredumps and no OOM (not
definitive given that it's only been about 4 hours, but highly suggestive)

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to