Hi Rainer,
no need for apology, as I have also been away, and only now am able to write a follow-up on this. I will try the latest rsyslog (8.7.0) as you suggested, and will also try to get more info out of valgrind if I can.
Thanks and regards,

Koral

------------------------------------------------------------------------

Sorry for the late reply, the busy holiday period and then break took it's
tool...

On Sat, Dec 6, 2014 at 12:03 AM, Koral Ilgun <koral.ilgun at calix.com  
<http://lists.adiscon.net/mailman/listinfo/rsyslog>> wrote:

/  Hello,
/>/  posting for the first time in this forum, I hope I am providing enough
/>/  details.
/>/
/>/  I have been using rsyslog v7.4.10 in our embedded system for over a year
/>/  now and only recently started noticing a memory leak. Less than 1Mb / day.
/>/  In the process of debugging this issue I have also tried v7.6.1, v8.4.2 and
/>/  the most recent v8.6.0, all of which exhibited the same memory leak.
/>/  I am now able to speed up this memory leak (multiple Mbs/day) under
/>/  certain conditions, which I will try to describe here.
/>/
/>/  I am using over 50 outchannels in this setup, which are all setup similar
/>/  to this:
/>/
/>/  $template logfmt,"%timegenerated:::date-rfc3339% [%syslogpriority%]
/>/  %syslogtag% %msg%\n"
/>/  $outchannel sshd_log, /var/log/sshd.log,             500000,
/>/  /usr/sbin/logrotate /etc/logrotate.conf
/>/  if $programname == 'sshd'               then :omfile:$sshd_log;logfmt
/>/
/>/
/>/  The log rotations are setup this way in /etc/logrotate.conf:
/>/
/>/  /var/log/sshd.log {
/>/      rotate 2
/>/      size=400k
/>/      postrotate
/>/      /etc/init.d/rsyslog.sh reload > /dev/null 2>&1 || true
/>/      endscript
/>/  }
/>/
/>/  I then generate lots of syslog messages (~200/sec) from various
/>/  applications to stress the system.
/>/  The initial memory leak I observed was due to the rate-limiting
/>/  configuration (after lots of trial and error in rsyslog.conf):
/>/
/>/  module(load="imuxsock" SysSock.RateLimit.Interval="2"
/>/  SysSock.RateLimit.Burst="200" SysSock.RateLimit.Severity="5"
/>/  SysSock.UsePIDFromSystem="on")
/>/
/>/  When I eliminated the rate limiting from the config, part of the memory
/>/  leak has disappeared, but not all.
/>/
/>/  I also have a remote-host configured for forwarding to a syslog server,
/>/  but removing that configuration did not make any difference.
/>/
/>/  I then tried reducing the maxsize for the 50+ outchannels from 500k to 50k
/>/  and generated again ~200/sec messages (total) in an attempt to speed up the
/>/  memory leak. This increased the speed of memory loss. ~1Mb/hour.
/>/
/>/  I then removed the maxsize from the outchannels and called the logrotate
/>/  program from a cron job every 5 minutes. This eliminated the memory leak.
/>/  Ran it overnight with no memory loss. So, the problem seems to be related
/>/  to (the frequency of) calling logrotate when maxsize is reached, and
/>/  possibly related to the fact that logrotate in turn HUPs rsyslog upon
/>/  rotating log files. If I just run rsyslog.sh reload (which results in HUP),
/>/  I don't see a memory leak either. It has to be rsyslog calling the maxsize
/>/  action (logrotate).
/>/
/>/  I ran valgrind on rsyslog with some success (logrotate's HUP doesn't seem
/>/  to be passed down to rsyslog from valgrind). Through valgrind, I noticed at
/>/  least two leaks (based on 8.6.0 sources):
/>/
/>/  1. 15k leak at ochConstruct during initialization:
/>/
/>/  ==18966== 15,891 (32 direct, 15,859 indirect) bytes in 1 blocks are
/>/  definitely lost in loss record 88 of 88
/>/  ==18966==    at 0xFF88B9C: calloc (in /usr/lib/valgrind/vgpreload_
/>/  memcheck-ppc32-linux.so)
/>/  ==18966==    by 0x1005F9EF: ochConstruct (outchannel.c:47)
/>/  ==18966==    by 0x1005FAAF: ochAddLine (outchannel.c:207)
/>/  ==18966==    by 0x1002B4AF: doNameLine (conf.c:190)
/>/  ==18966==    by 0x1005857B: processCfSysLineCommand (cfsysline.c:760)
/>/  ==18966==    by 0x1002B63B: cfsysline (conf.c:233)
/>/  ==18966==    by 0x1002F08F: cnfDoCfsysline (rsconf.c:453)
/>/  ==18966==    by 0x1001DE5B: yylex (lexer.l:251)
/>/  ==18966==    by 0x1001B0AF: yyparse (grammar.c:1536)
/>/  ==18966==    by 0x1002E593: load (rsconf.c:1278)
/>/  ==18966==    by 0x1000D117: initAll (rsyslogd.c:930)
/>/  ==18966==    by 0x1000A6A3: main (rsyslogd.c:1318)
/>/
/>/  The pOch pointer doesn't seem to be registered anywhere to be freed later.
/>/  This is not a huge issue, because it is only done once during
/>/  initialization, not a continuous leak like the other case(s).
/>/
/
Yeah, that's cosmetic. Haven't fixed it for now.


/  2. The next one is in tools/omfile.c in prepareFile:
/>/
/>/  ==16419== 280 bytes in 7 blocks are definitely lost in loss record 81 of 91
/>/  ==16419==    at 0xFF86274: malloc (in /usr/lib/valgrind/vgpreload_
/>/  memcheck-ppc32-linux.so)
/>/  ==16419==    by 0x40934FF: strdup (strdup.c:42)
/>/  ==16419==    by 0x10013D2F: prepareFile (unicode-helper.h:46)
/>/  ==16419==    by 0x10014A23: commitTransaction (omfile.c:831)
/>/  ==16419==    by 0x1005B687: actionCommit (action.c:1088)
/>/  ==16419==    by 0x1005C9A3: actionCommitAllDirect (action.c:1256)
/>/  ==16419==    by 0x10054A03: processBatch (ruleset.c:484)
/>/  ==16419==    by 0x1000B7BF: msgConsumer (rsyslogd.c:342)
/>/  ==16419==    by 0x10050347: ConsumerReg (queue.c:1897)
/>/  ==16419==    by 0x1004C3D3: wtiWorker (wti.c:334)
/>/  ==16419==    by 0x1004B077: wtpWorker (wtp.c:390)
/>/  ==16419==    by 0xFE17BE7: start_thread (pthread_create.c:314)
/>/
/>/  The strdup'd pszSizeLimitCmd (which translates to "/usr/bin/logrotate
/>/  /etc/logrotate.conf") doesn't seem to be freed later. I am not sure if this
/>/  is a major leak or not. But, it appears that a free for pszSizeLimitCmd is
/>/  missing in runtime/stream.c: CODESTARTobjDestruct routine.
/>/
/>/
/Yup, and it is not only cosmetic, but not big either (as you say). In any
case, I have now fixed it. The fix will be part of the 8.7.0 release next
Tuesday.


/  Neither of the above findings explain the rate and amount of memory loss I
/>/  am experiencing, so I think some of the codepath is not being executed when
/>/  rsyslog is run in the context of valgrind (e.g. the HUP case I mentioned
/>/  earlier).
/>/
/>/  I also have impstats enabled and monitoring the maxrss from there. The
/>/  queue sizes are *not* growing.
/>/
/>/
/I have also reviewed the code in question, but I don't see anything that
looks like it could become a memory leak. Also, valgrind usually is quite
good in reporting those leaks. I usually also get the HUP signal when
running under valgrind (as well as others), so you might want to check your
setup (including valgrind versions, maybe that matters -- but I use it for
many years now).

So in short: I have no idea where the remaining problem you see stems back
from. It would probably be good if you could re-check with 8.7.0 when it is
out (or git master-candidate branch if you would like to check immediately).

Again, my apologies for the long silence.
Rainer


/  I will be running mtrace (and maybe coverity) on this setup as well, and
/>/  will post further findings. If anyone else knows about any rsyslog memory
/>/  leaks due to frequent log rotations, please post to this forum.
/>/
/>/  Thanks and regards,
/>/  Koral
/>/
/

------------------------------------------------------------------------



_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to