Since this is all "only just" a test-setup, I have not yet gotten into
creating an optimized configuration.
I am aware, that default queue sizes are overkill, however the
problems I encountered here were happening on an idle system.
Though I am not familiar with the internals, I would at least be
surprised that a handful of messages
(created using 'logger' in the shell) would consume all the memory.
The segfault was actually caused by an optimization, turning off
-fguess-branch-probability fixed the segfault right away.
Unfortunately I don't have valgrind available but I am able to run it
in a remote debugger and what I saw didn't make much sense to me
(in my experience with this platform, this is either due to a
multithreading or compiler issue).
The segfault always ocurred in rsyslogd.c:1749 when trying to call the
function janitorRun().
Note: The mere _attempt_ to just call the function at this point would
cause the segfault, not something inside janitorRun().
Though this may seem weird, it's not the first compiler issue I
encountered on this platform
and not the first one that causes crashes as well...
So once I turned off this particular optimization (found by trial and
error), rsyslog actually ran quite well, everything seemed to be
working fine.
Remote and local logging did work, and memory consumption wasn't too
high (I forgot what it was exactly, but I had ~38 MBytes of RAM left).
However, when I stopped the remote syslog server (Winsyslog), I saw
rsyslog taking 100% CPU.
This stopped when omrelp was able to reconnect.
While rsyslog was looping, it did neither crash, nor take up all
system memory (memory consumption stayed the same),
which was the point at which I started the debugger again to find
where it was consuming all that time.
This eventually lead me to doTransaction() inside action.c where I
found what I was writing in my previous mails,
that it would loop around and not stop until omrelp was able to
reconnect.
I then fired up a Ubuntu 16.04 VM (at that time I was naively thinking
it would use a current version),
because I thought to myself: "No way, this must be happening on a PC
too".
The rsyslogd running in the VM didn't have the issue, but I quickly
found out, that it was using rsyslogd 8.16.
With the knowledge of where it was looping, I then started to look at
the history of the rsyslog source
and found the change I mentioned before.
This is how far I have dug until this point.
To both of you: Thanks for the help and suggestions, I really
appreciate it :-)
Regards,
Andreas
So, I can actually reproduce this on a PC.
I've built everything manually and encounter the exact same issue:
As long as the remote end is unavailable, rsyslog hangs in a loop
consuming time of an entire CPU core.
The log output and configuration is the same as I have posted before.
Regards,
Andreas
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.