Hi Rainer,
no need for apology, as I have also been away, and only now am able to
write a follow-up on this.
I will try the latest rsyslog (8.7.0) as you suggested, and will also
try to get more info out of valgrind if I can.
Thanks and regards,
Koral
------------------------------------------------------------------------
Sorry for the late reply, the busy holiday period and then break took it's
tool...
On Sat, Dec 6, 2014 at 12:03 AM, Koral Ilgun <koral.ilgun at calix.com
<http://lists.adiscon.net/mailman/listinfo/rsyslog>> wrote:
/ Hello,
/>/ posting for the first time in this forum, I hope I am providing enough
/>/ details.
/>/
/>/ I have been using rsyslog v7.4.10 in our embedded system for over a year
/>/ now and only recently started noticing a memory leak. Less than 1Mb / day.
/>/ In the process of debugging this issue I have also tried v7.6.1, v8.4.2 and
/>/ the most recent v8.6.0, all of which exhibited the same memory leak.
/>/ I am now able to speed up this memory leak (multiple Mbs/day) under
/>/ certain conditions, which I will try to describe here.
/>/
/>/ I am using over 50 outchannels in this setup, which are all setup similar
/>/ to this:
/>/
/>/ $template logfmt,"%timegenerated:::date-rfc3339% [%syslogpriority%]
/>/ %syslogtag% %msg%\n"
/>/ $outchannel sshd_log, /var/log/sshd.log, 500000,
/>/ /usr/sbin/logrotate /etc/logrotate.conf
/>/ if $programname == 'sshd' then :omfile:$sshd_log;logfmt
/>/
/>/
/>/ The log rotations are setup this way in /etc/logrotate.conf:
/>/
/>/ /var/log/sshd.log {
/>/ rotate 2
/>/ size=400k
/>/ postrotate
/>/ /etc/init.d/rsyslog.sh reload > /dev/null 2>&1 || true
/>/ endscript
/>/ }
/>/
/>/ I then generate lots of syslog messages (~200/sec) from various
/>/ applications to stress the system.
/>/ The initial memory leak I observed was due to the rate-limiting
/>/ configuration (after lots of trial and error in rsyslog.conf):
/>/
/>/ module(load="imuxsock" SysSock.RateLimit.Interval="2"
/>/ SysSock.RateLimit.Burst="200" SysSock.RateLimit.Severity="5"
/>/ SysSock.UsePIDFromSystem="on")
/>/
/>/ When I eliminated the rate limiting from the config, part of the memory
/>/ leak has disappeared, but not all.
/>/
/>/ I also have a remote-host configured for forwarding to a syslog server,
/>/ but removing that configuration did not make any difference.
/>/
/>/ I then tried reducing the maxsize for the 50+ outchannels from 500k to 50k
/>/ and generated again ~200/sec messages (total) in an attempt to speed up the
/>/ memory leak. This increased the speed of memory loss. ~1Mb/hour.
/>/
/>/ I then removed the maxsize from the outchannels and called the logrotate
/>/ program from a cron job every 5 minutes. This eliminated the memory leak.
/>/ Ran it overnight with no memory loss. So, the problem seems to be related
/>/ to (the frequency of) calling logrotate when maxsize is reached, and
/>/ possibly related to the fact that logrotate in turn HUPs rsyslog upon
/>/ rotating log files. If I just run rsyslog.sh reload (which results in HUP),
/>/ I don't see a memory leak either. It has to be rsyslog calling the maxsize
/>/ action (logrotate).
/>/
/>/ I ran valgrind on rsyslog with some success (logrotate's HUP doesn't seem
/>/ to be passed down to rsyslog from valgrind). Through valgrind, I noticed at
/>/ least two leaks (based on 8.6.0 sources):
/>/
/>/ 1. 15k leak at ochConstruct during initialization:
/>/
/>/ ==18966== 15,891 (32 direct, 15,859 indirect) bytes in 1 blocks are
/>/ definitely lost in loss record 88 of 88
/>/ ==18966== at 0xFF88B9C: calloc (in /usr/lib/valgrind/vgpreload_
/>/ memcheck-ppc32-linux.so)
/>/ ==18966== by 0x1005F9EF: ochConstruct (outchannel.c:47)
/>/ ==18966== by 0x1005FAAF: ochAddLine (outchannel.c:207)
/>/ ==18966== by 0x1002B4AF: doNameLine (conf.c:190)
/>/ ==18966== by 0x1005857B: processCfSysLineCommand (cfsysline.c:760)
/>/ ==18966== by 0x1002B63B: cfsysline (conf.c:233)
/>/ ==18966== by 0x1002F08F: cnfDoCfsysline (rsconf.c:453)
/>/ ==18966== by 0x1001DE5B: yylex (lexer.l:251)
/>/ ==18966== by 0x1001B0AF: yyparse (grammar.c:1536)
/>/ ==18966== by 0x1002E593: load (rsconf.c:1278)
/>/ ==18966== by 0x1000D117: initAll (rsyslogd.c:930)
/>/ ==18966== by 0x1000A6A3: main (rsyslogd.c:1318)
/>/
/>/ The pOch pointer doesn't seem to be registered anywhere to be freed later.
/>/ This is not a huge issue, because it is only done once during
/>/ initialization, not a continuous leak like the other case(s).
/>/
/
Yeah, that's cosmetic. Haven't fixed it for now.
/ 2. The next one is in tools/omfile.c in prepareFile:
/>/
/>/ ==16419== 280 bytes in 7 blocks are definitely lost in loss record 81 of 91
/>/ ==16419== at 0xFF86274: malloc (in /usr/lib/valgrind/vgpreload_
/>/ memcheck-ppc32-linux.so)
/>/ ==16419== by 0x40934FF: strdup (strdup.c:42)
/>/ ==16419== by 0x10013D2F: prepareFile (unicode-helper.h:46)
/>/ ==16419== by 0x10014A23: commitTransaction (omfile.c:831)
/>/ ==16419== by 0x1005B687: actionCommit (action.c:1088)
/>/ ==16419== by 0x1005C9A3: actionCommitAllDirect (action.c:1256)
/>/ ==16419== by 0x10054A03: processBatch (ruleset.c:484)
/>/ ==16419== by 0x1000B7BF: msgConsumer (rsyslogd.c:342)
/>/ ==16419== by 0x10050347: ConsumerReg (queue.c:1897)
/>/ ==16419== by 0x1004C3D3: wtiWorker (wti.c:334)
/>/ ==16419== by 0x1004B077: wtpWorker (wtp.c:390)
/>/ ==16419== by 0xFE17BE7: start_thread (pthread_create.c:314)
/>/
/>/ The strdup'd pszSizeLimitCmd (which translates to "/usr/bin/logrotate
/>/ /etc/logrotate.conf") doesn't seem to be freed later. I am not sure if this
/>/ is a major leak or not. But, it appears that a free for pszSizeLimitCmd is
/>/ missing in runtime/stream.c: CODESTARTobjDestruct routine.
/>/
/>/
/Yup, and it is not only cosmetic, but not big either (as you say). In any
case, I have now fixed it. The fix will be part of the 8.7.0 release next
Tuesday.
/ Neither of the above findings explain the rate and amount of memory loss I
/>/ am experiencing, so I think some of the codepath is not being executed when
/>/ rsyslog is run in the context of valgrind (e.g. the HUP case I mentioned
/>/ earlier).
/>/
/>/ I also have impstats enabled and monitoring the maxrss from there. The
/>/ queue sizes are *not* growing.
/>/
/>/
/I have also reviewed the code in question, but I don't see anything that
looks like it could become a memory leak. Also, valgrind usually is quite
good in reporting those leaks. I usually also get the HUP signal when
running under valgrind (as well as others), so you might want to check your
setup (including valgrind versions, maybe that matters -- but I use it for
many years now).
So in short: I have no idea where the remaining problem you see stems back
from. It would probably be good if you could re-check with 8.7.0 when it is
out (or git master-candidate branch if you would like to check immediately).
Again, my apologies for the long silence.
Rainer
/ I will be running mtrace (and maybe coverity) on this setup as well, and
/>/ will post further findings. If anyone else knows about any rsyslog memory
/>/ leaks due to frequent log rotations, please post to this forum.
/>/
/>/ Thanks and regards,
/>/ Koral
/>/
/
------------------------------------------------------------------------
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.