Since this is all "only just" a test-setup, I have not yet gotten into creating an optimized configuration. I am aware, that default queue sizes are overkill, however the problems I encountered here were happening on an idle system. Though I am not familiar with the internals, I would at least be surprised that a handful of messages
(created using 'logger' in the shell) would consume all the memory.

The segfault was actually caused by an optimization, turning off -fguess-branch-probability fixed the segfault right away. Unfortunately I don't have valgrind available but I am able to run it in a remote debugger and what I saw didn't make much sense to me (in my experience with this platform, this is either due to a multithreading or compiler issue).

The segfault always ocurred in rsyslogd.c:1749 when trying to call the function janitorRun(). Note: The mere _attempt_ to just call the function at this point would cause the segfault, not something inside janitorRun().

Though this may seem weird, it's not the first compiler issue I encountered on this platform
and not the first one that causes crashes as well...

So once I turned off this particular optimization (found by trial and error), rsyslog actually ran quite well, everything seemed to be working fine. Remote and local logging did work, and memory consumption wasn't too high (I forgot what it was exactly, but I had ~38 MBytes of RAM left).

However, when I stopped the remote syslog server (Winsyslog), I saw rsyslog taking 100% CPU.
This stopped when omrelp was able to reconnect.

While rsyslog was looping, it did neither crash, nor take up all system memory (memory consumption stayed the same), which was the point at which I started the debugger again to find where it was consuming all that time.

This eventually lead me to doTransaction() inside action.c where I found what I was writing in my previous mails, that it would loop around and not stop until omrelp was able to reconnect.

I then fired up a Ubuntu 16.04 VM (at that time I was naively thinking it would use a current version), because I thought to myself: "No way, this must be happening on a PC too".

The rsyslogd running in the VM didn't have the issue, but I quickly found out, that it was using rsyslogd 8.16. With the knowledge of where it was looping, I then started to look at the history of the rsyslog source
and found the change I mentioned before.

This is how far I have dug until this point.

To both of you: Thanks for the help and suggestions, I really appreciate it :-)

Regards,
Andreas

So, I can actually reproduce this on a PC.
I've built everything manually and encounter the exact same issue:

As long as the remote end is unavailable, rsyslog hangs in a loop consuming time of an entire CPU core.
The log output and configuration is the same as I have posted before.


Regards,
Andreas

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to