On Fri, 18 Mar 2016, Rainer Gerhards wrote:
2016-03-17 22:15 GMT+01:00 David Lang <[email protected]>:
I have been running 8.17 from the repo combined with a copy of liblognorm
2.0.0 that I compiled during the 8.16 timeframe. Since the upgrade to 8.17
I've been getting a few coredumps. Since I enabled async writing to
dynamically generated files, rsyslog is using 1500+ threads (up from ~17
threads prior to this change) and has been running a 32G box OOM repeatedly.
Just a side-note: async writing is very ressource intense. Among
other, it requires one background thread per file. It should be turned
on only if there is sufficient slow processing, e.g zip writing with
larger io buffers. If io buffers are small, there is also probably
lots of locking contention between forground action processing and
background writer.
In this case I am doing a zip -9 of the files. I only tried doing this because
the worker for this queue was showing as the highest CPU utilization in top. for
now, I split it into two queues, one per action, and combined with async
disabled, I'm back to almost the speed I was getting with the 8.17 pre.
Figuring that there is a reasonable chance that these problems are due to
the mixing of versions, I compiled from today's git tree and deployed that
to a server that's receiving a flood of logs from queues that are flushing
to it. When I deployed the new version, the throughput dropped noticably
(~30% drop from handling ~300K messages/min to handling 200K messages/min)
One thing that triggers my mind is the change of hash function. In
8.17, we use a different hash function inside libfastjson. This has
prooven to be much faster in experiments we carried out, but maybe
it's not the case for you. You can comment out line 1640
https://github.com/rsyslog/rsyslog/blob/master/tools/rsyslogd.c#L1640
here and see what happens.
I'll try to give that a try, but may not get a chance before monday.
By the way, the libfastjson version number is still 0.9.2 in the git tree, you
missed bumping it to 0.9.3 after the release.
I need to go through the rest of the thread in more detail. However, I
have reviewed the ChangeLog and neither it nor my memory points into
any changes in 8.17 that would explain what you see (also refering to
your later mail). So we probably need to track things down. A good
start would be if you could run tests on a dedicated test system.
Running under valgrind to see the potential leak and doing a git
bisect to find a potential culprit sounds like good next steps to me.
well, right now, this box is fair game. the old version got corrupted and so
there is almost 2 months of logs to replay into it from archives of queues.
there is a duplicate of it that has been up handling the same traffic all along.
so to summarize
running pre 8.17 with a pre 8.17 liblognorm was working with possibly some rare
coredumps. async was turned on during this time.
upgrading to 8.17 (but keeping the old liblognorm, with a symlink ln -s
liblognorm.so.4 liblognorm.so.2) with no config changes caused multiple
coredumps, and a slow memory leak
upgrading to current git solved the coredumps, but the memory leak was killing
rsyslog about hourly. But there was a significant performance hit.
turning off async solved the memory leak
I just saw the box reverted to the libfastjson that is in the adiscon repo
rather than my compiled copy from git, so when I restart things in about a half
hour, it will be running against that.
I'm about to turn in for the night, but tomorrow I'll look over any test
scenarios and see about trying to test them over the weekend. or early next
week. I do have rather extensive stats gathering on this system, so if we want
to look back in time at what was happening, I can probably do so.
David Lang
Rainer
This is with no config changes, just changing the binary packages.
This is with a rather complex ruleset (11 queues, 75+ actions, at least 4
mmnormalize calls, one with a 1500 line ruleset)
Since pushing the new version to this machine, no coredumps and no OOM (not
definitive given that it's only been about 4 hours, but highly suggestive)
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.