And here's the output with the leak-check=full option: > 3049.206693904:766b700: end prop repl, pRes='2012', len 4 > 3049.206849430:766b700: prop repl 4, pRes='05', len -1 > 3049.206965668:766b700: end prop repl, pRes='05', len 2 > 3049.207120712:766b700: prop repl 4, pRes='03', len -1 > 3049.207236643:766b700: end prop repl, pRes='03', len 2 > 3049.207385859:766b700: prop repl 4, pRes='10.191.19.27', len -1 > 3049.207503212:766b700: end prop repl, pRes='10.191.19.27', len 12 > 3049.207621820:766b700: prop repl 4, pRes='authpriv', len -1 > 3049.207735189:766b700: end prop repl, pRes='authpriv', len 8 > 3049.207903492:766b700: prop repl 4, pRes='2012', len -1 > 3049.208020776:766b700: end prop repl, pRes='2012', len 4 > 3049.208175477:766b700: prop repl 4, pRes='05', len -1 > 3049.208291580:766b700: end prop repl, pRes='05', len 2 > 3049.208445428:766b700: prop repl 4, pRes='03', len -1 > 3049.208561306:766b700: end prop repl, pRes='03', len 2 > 3049.208709151:766b700: prop repl 4, pRes='204.236.165.198', len -1 > 3049.208826659:766b700: end prop repl, pRes='204.236.165.198', len 15 > 3049.208946453:766b700: prop repl 4, pRes='local2', len -1 > 3049.225906852:766b700: end prop repl, pRes='local2', len 6 > 3049.226074826:766b700: prop repl 4, pRes='2012', len -1 > 3049.226191980:766b700: end prop repl, pRes='2012', len 4 > 3049.226348188:766b700: prop repl 4, pRes='05', len -1 > 3049.226464604:766b700: end prop repl, pRes='05', len 2 > 3049.226625159:766b700: prop repl 4, pRes='03', len -1 > 3049.226741423:766b700: end prop repl, pRes='03', len 2 > 3049.226909093:766b700: prop repl 4, pRes='50.18.75.156', len -1 > 3049.227027296:766b700: end prop repl, pRes='50.18.75.156', len 12 > 3049.227146531:766b700: prop repl 4, pRes='local2', len -1 > 3049.227260506:766b700: end prop repl, pRes='local2', len 6 > 3049.227415344:766b700: prop repl 4, pRes='2012', len -1 > 3049.227531737:766b700: end prop repl, pRes='2012', len 4 > 3049.227686657:766b700: prop repl 4, pRes='05', len -1 > 3049.227823771:766b700: end prop repl, pRes='05', len 2 > 3049.227986981:766b700: prop repl 4, pRes='03', len -1 > 3049.228103631:766b700: end prop repl, pRes='03', len 2 > 3049.228232819:766b700: XXXXX: tryDoAction 0x5a2e6d0, pnElem 5, nElem 5 > 3049.228352738:766b700: Action 0x5a2e6d0 transitioned to state: itx > 3049.228462772:766b700: entering actionCalldoAction(), state: itx > 3049.228570909:766b700: file to log to: FILENAME > 3049.228697210:766b700: write to stream, pData->pStrm 0xb1a18e0, lenBuf 223 > 3049.228822219:766b700: action 0x5a2e6d0 call returned -2121 > 3049.228938493:766b700: entering actionCalldoAction(), state: itx > 3049.246724313:766b700: file to log to: FILENAME > 3049.246865159:766b700: write to stream, pData->pStrm 0xb399440, lenBuf 111 > 3049.246990810:766b700: action 0x5a2e6d0 call returned -2121 > 3049.247106361:766b700: entering actionCalldoAction(), state: itx > 3049.247214322:766b700: file to log to: FILENAME > 3049.247340965:766b700: write to stream, pData->pStrm 0xb399bc0, lenBuf 112 > 3049.247462568:766b700: action 0x5a2e6d0 call returned -2121 > 3049.247574716:766b700: entering actionCalldoAction(), state: itx > 3049.247682310:766b700: file to log to: FILENAME > 3049.247823611:766b700: write to stream, pData->pStrm 0x5bccd90, lenBuf 164 > 3049.247949863:766b700: action 0x5a2e6d0 call returned -2121 > 3049.248062303:766b700: entering actionCalldoAction(), state: itx > 3049.248169734:766b700: file to log to: FILENAME > 3049.248296637:766b700: Removed entry 2 for file > '/mnt/logs/50.112.37.82/authpriv-2012-05-03.log' from dynaCache. > 3049.248427757:766b700: strm 0xaa7a1a0: file 13(authpriv-2012-05-03.log) > closing > 3049.248569534:766b700: strm 0xaa7a1a0: file 13(authpriv-2012-05-03.log) > flush, buflen 0 (no need to flush) > 3049.248848350:766b700: file stream local2-2012-05-03.log params: flush > interval 0, async write 0 > 3049.248994574:766b700: Added new entry 2 for file cache, file > '/mnt/logs/50.18.75.156/local2-2012-05-03.log'. > 3049.249244122:766b700: write to stream, pData->pStrm 0x5c0cab0, lenBuf 368 > 3049.266993352:766b700: action 0x5a2e6d0 call returned -2121 > 3049.267156843:766b700: strm 0x5c0cab0: file -1(local2-2012-05-03.log) flush, > buflen 368 > 3049.267340310:766b700: file '/mnt/logs/50.18.75.156/local2-2012-05-03.log' > opened as #13 with mode 416 > 3049.267484698:766b700: strm 0x5c0cab0: opened file > '/mnt/logs/50.18.75.156/local2-2012-05-03.log' for WRITE as 13 > 3049.267638704:766b700: strm 0x5c0cab0: file 13 write wrote 368 bytes > 3049.267788358:766b700: Action 0x5a2e6d0 transitioned to state: rdy > 3049.267923620:766b700: ruleset: get iRet 0 from rule.ProcessMsg() > 3049.268036343:766b700: ruleset.ProcessMsg() returns 0 > 3049.268158427:766b700: regular consumer finished, iret=0, szlog 1 sz phys 6 > 3049.268362726:766b700: we deleted 5 objects and enqueued 0 objects > 3049.268483154:766b700: delete batch from store, new sizes: log 1, phys 1 > 3049.268633137:766b700: msg parser: flags 30, from > 'ec2-204-236-179-181.us-west-1.compute.amazonaws.com', msg > '<150>2012-05-03T21:33:27.973443+00:00 prod-ts-uswest1-9-i-23' > 3049.268751986:766b700: parse using parser list 0x5a02e40 (the default list). > 3049.268908367:766b700: Parser 'rsyslog.rfc5424' returned -2160 > 3049.269094815:766b700: Message will now be parsed by the legacy syslog > parser (one size fits all... ;)). > 3049.269318477:766b700: MsgSetTAG in: len 7, pszBuf: apache: > 3049.269516203:766b700: MsgSetTAG exit: pMsg->iLenTAG 7, pMsg->TAG.szBuf: > apache: > 3049.287378551:766b700: Parser 'rsyslog.rfc3164' returned 0 > 3049.287498785:766b700: processBatch: batch of 1 elements must be processed > 3049.287607743:766b700: Processing next rule > 3049.287719767:766b700: testing filter, f_pmask 255 > 3049.287843919:766b700: Processing next action > 3049.287968743:766b700: Called action(NotAllMark), processing batch[0] via > 'builtin-file' > 3049.288078927:766b700: Called action(Batch), logging to builtin-file > 3049.288248978:766b700: prop repl 4, pRes='204.236.179.181', len -1 > 3049.288367943:766b700: end prop repl, pRes='204.236.179.181', len 15 > 3049.288488240:766b700: prop repl 4, pRes='local2', len -1 > 3049.288602119:766b700: end prop repl, pRes='local2', len 6 > 3049.288758247:766b700: prop repl 4, pRes='2012', len -1 > 3049.288874567:766b700: end prop repl, pRes='2012', len 4 > 3049.289168469:766b700: prop repl 4, pRes='05', len -1 > 3049.289365269:766b700: end prop repl, pRes='05', len 2 > 3049.289601218:766b700: prop repl 4, pRes='03', len -1 > 3049.289795968:766b700: end prop repl, pRes='03', len 2 > 3049.290002018:766b700: XXXXX: tryDoAction 0x5a2e6d0, pnElem 1, nElem 1 > 3049.307849612:766b700: Action 0x5a2e6d0 transitioned to state: itx > 3049.307969368:766b700: entering actionCalldoAction(), state: itx > 3049.308078202:766b700: file to log to: FILENAME > 3049.308205615:766b700: Removed entry 6 for file > '/mnt/logs/184.169.236.243/authpriv-2012-05-03.log' from dynaCache. > 3049.308330100:766b700: strm 0xaa8daa0: file 15(authpriv-2012-05-03.log) > closing > 3049.308472984:766b700: strm 0xaa8daa0: file 15(authpriv-2012-05-03.log) > flush, buflen 120 > 3049.308623920:766b700: strm 0xaa8daa0: file 15 write wrote 120 bytes > 3049.308874666:766b700: file stream local2-2012-05-03.log params: flush > interval 0, async write 0 > 3049.309152842:766b700: Added new entry 6 for file cache, file > '/mnt/logs/204.236.179.181/local2-2012-05-03.log'. > 3049.309358728:766b700: write to stream, pData->pStrm 0x5c0d6d0, lenBuf 578 > 3049.309567506:766b700: action 0x5a2e6d0 call returned -2121 > 3049.309798574:766b700: strm 0x5c0d6d0: file -1(local2-2012-05-03.log) flush, > buflen 578 > 3049.310075699:766b700: file > '/mnt/logs/204.236.179.181/local2-2012-05-03.log' opened as #15 with mode 416 > 3049.310294286:766b700: strm 0x5c0d6d0: opened file > '/mnt/logs/204.236.179.181/local2-2012-05-03.log' for WRITE as 15 > 3049.310566768:766b700: strm 0x5c0d6d0: file 15 write wrote 578 bytes > 3049.310804697:766b700: Action 0x5a2e6d0 transitioned to state: rdy > 3049.319945343:766b700: ruleset: get iRet 0 from rule.ProcessMsg() > 3049.320061350:766b700: ruleset.ProcessMsg() returns 0 > 3049.320180243:766b700: regular consumer finished, iret=0, szlog 0 sz phys 1 > 3049.320324960:766b700: we deleted 1 objects and enqueued 0 objects > 3049.320442164:766b700: delete batch from store, new sizes: log 0, phys 0 > 3049.320565162:766b700: regular consumer finished, iret=4, szlog 0 sz phys 0 > 3049.320675938:766b700: main Q:Reg/w0: worker IDLE, waiting for work. > 3049.473251254:966f700: XXXXXX: doRetry: iRet 0, pNsd->bAbortConn 0 > 3049.473510223:966f700: hasRcvInBuffer on nsd 0x5bc3510: pszRcvBuf (nil), > lenRcvBuf 0 > 3049.473700422:966f700: GnuTLS requested retry of 1 operation - executing > 3049.725691130:966f700: XXXXXX: doRetry: iRet 0, pNsd->bAbortConn 0 > 3049.726059044:966f700: hasRcvInBuffer on nsd 0x5d9b260: pszRcvBuf (nil), > lenRcvBuf 0 > 3049.726266390:966f700: hasRcvInBuffer on nsd 0x5d9b5c0: pszRcvBuf (nil), > lenRcvBuf 0 > 3049.726464907:966f700: hasRcvInBuffer on nsd 0x5d9c430: pszRcvBuf 0xaa6fec0, > lenRcvBuf -1 > 3049.726663962:966f700: hasRcvInBuffer on nsd 0x5dba7f0: pszRcvBuf 0xaa82e20, > lenRcvBuf -1 > 3049.726868320:966f700: hasRcvInBuffer on nsd 0xace90a0: pszRcvBuf 0xb186510, > lenRcvBuf -1 > 3049.727069330:966f700: hasRcvInBuffer on nsd 0xad0f980: pszRcvBuf 0xb18d5e0, > lenRcvBuf -1 > 3049.727267081:966f700: hasRcvInBuffer on nsd 0xad2e750: pszRcvBuf 0xb1b6670, > lenRcvBuf -1 > 3049.727465128:966f700: hasRcvInBuffer on nsd 0xb167bc0: pszRcvBuf 0xb384ec0, > lenRcvBuf -1 > 3049.727672278:966f700: hasRcvInBuffer on nsd 0xb194b80: pszRcvBuf 0x5bb35c0, > lenRcvBuf -1 > 3049.727874922:966f700: hasRcvInBuffer on nsd 0xb363940: pszRcvBuf 0x5ba79e0, > lenRcvBuf -1 > 3049.728073101:966f700: hasRcvInBuffer on nsd 0xb38c3b0: pszRcvBuf 0x5be4040, > lenRcvBuf -1 > 3049.728270670:966f700: hasRcvInBuffer on nsd 0x5b91600: pszRcvBuf (nil), > lenRcvBuf 0 > 3049.728468483:966f700: hasRcvInBuffer on nsd 0x5bc3510: pszRcvBuf (nil), > lenRcvBuf 0 > 3049.728665936:966f700: --------<NSDSEL_PTCP> calling select, active fds (max > 25): 9 10 11 12 14 16 18 19 20 21 22 23 25 > 3049.730337867:966f700: hasRcvInBuffer on nsd 0x5d9b260: pszRcvBuf (nil), > lenRcvBuf 0 > 3049.730531724:966f700: New connect on NSD 0x5d9b4c0. > ==3607== Thread 3: > ==3607== Jump to the invalid address stated on the next line > ==3607== at 0x0: ??? > ==3607== by 0x6215C3C: MsgSetInputName (in /usr/lib/rsyslog/imuxsock.so) > ==3607== by 0x6210BD8: ??? (in /usr/lib/rsyslog/imuxsock.so) > ==3607== by 0x6211392: ??? (in /usr/lib/rsyslog/imuxsock.so) > ==3607== by 0x43B809: ??? (in /usr/sbin/rsyslogd) > ==3607== by 0x504A9C9: start_thread (pthread_create.c:300) > ==3607== by 0x7E6C6FF: ??? > ==3607== Address 0x0 is not stack'd, malloc'd or (recently) free'd > ==3607== > ==3607== > ==3607== Process terminating with default action of signal 11 (SIGSEGV) > ==3607== Bad permissions for mapped region at address 0x0 > ==3607== at 0x0: ??? > ==3607== by 0x6215C3C: MsgSetInputName (in /usr/lib/rsyslog/imuxsock.so) > ==3607== by 0x6210BD8: ??? (in /usr/lib/rsyslog/imuxsock.so) > ==3607== by 0x6211392: ??? (in /usr/lib/rsyslog/imuxsock.so) > ==3607== by 0x43B809: ??? (in /usr/sbin/rsyslogd) > ==3607== by 0x504A9C9: start_thread (pthread_create.c:300) > ==3607== by 0x7E6C6FF: ??? > ==3607== > ==3607== HEAP SUMMARY: > ==3607== in use at exit: 681,871 bytes in 3,582 blocks > ==3607== total heap usage: 153,960 allocs, 150,378 frees, 12,862,358 bytes > allocated > ==3607== > ==3607== Thread 1: > ==3607== 272 bytes in 1 blocks are possibly lost in loss record 1,129 of 1,226 > ==3607== at 0x4C267CC: calloc (vg_replace_malloc.c:467) > ==3607== by 0x4012495: _dl_allocate_tls (dl-tls.c:300) > ==3607== by 0x504B728: pthread_create@@GLIBC_2.2.5 (allocatestack.c:561) > ==3607== by 0x42BDEA: wtpAdviseMaxWorkers (in /usr/sbin/rsyslogd) > ==3607== by 0x432D22: qqueueEnqObj (in /usr/sbin/rsyslogd) > ==3607== by 0x40CE5C: submitMsg (in /usr/sbin/rsyslogd) > ==3607== by 0x64563D3: ??? (in /usr/lib/rsyslog/imklog.so) > ==3607== by 0x6456591: imklogLogIntMsg (in /usr/lib/rsyslog/imklog.so) > ==3607== by 0x6456713: klogWillRun (in /usr/lib/rsyslog/imklog.so) > ==3607== by 0x40DB80: ??? (in /usr/sbin/rsyslogd) > ==3607== by 0x40E14D: ??? (in /usr/sbin/rsyslogd) > ==3607== by 0x40F8A6: realMain (in /usr/sbin/rsyslogd) > ==3607== > ==3607== 1,088 bytes in 4 blocks are possibly lost in loss record 1,176 of > 1,226 > ==3607== at 0x4C267CC: calloc (vg_replace_malloc.c:467) > ==3607== by 0x4012495: _dl_allocate_tls (dl-tls.c:300) > ==3607== by 0x504B728: pthread_create@@GLIBC_2.2.5 (allocatestack.c:561) > ==3607== by 0x43B769: thrdCreate (in /usr/sbin/rsyslogd) > ==3607== by 0x40E20E: ??? (in /usr/sbin/rsyslogd) > ==3607== by 0x40F8A6: realMain (in /usr/sbin/rsyslogd) > ==3607== by 0x568BC4C: (below main) (libc-start.c:226) > ==3607== > ==3607== LEAK SUMMARY: > ==3607== definitely lost: 0 bytes in 0 blocks > ==3607== indirectly lost: 0 bytes in 0 blocks > ==3607== possibly lost: 1,360 bytes in 5 blocks > ==3607== still reachable: 680,511 bytes in 3,577 blocks > ==3607== suppressed: 0 bytes in 0 blocks > ==3607== Reachable blocks (those to which a pointer was found) are not shown. > ==3607== To see them, rerun with: --leak-check=full --show-reachable=yes > ==3607== > ==3607== For counts of detected and suppressed errors, rerun with: -v > ==3607== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 43 from 7)
On May 3, 2012, at 1:02 PM, Rainer Gerhards wrote: > Can you run it under valgrind? > > Rainer > > Matt Wise <[email protected]> hat geschrieben:We're using Rsyslog in our > inftrastructure on a bunch of Ubuntu 10.04 hosts. Due to memory and CPU leaks > in the 4.xx rsyslog code (we're using TLS on everything), we've had to > upgrade to the 5.xx series of code. > > Our clients are running the Evax-built Rsyslog 5.6.3 and they generally seem > to work fine. Our rsyslog 'receiver' hosts were running 5.6.3, but we ran > into a serious memory leak that caused Rsyslog to bloat after about a day. We > have upgraded them to hand-built Rsyslog 5.8.10 code (and also tried 5.9.3 at > one point). This worked fine for about 2 days.. and now we are getting near > non-stop Seg Faults on the receiver host running Rsyslog. > > I've tried everything I can think of, but we still have the problem.. here > are our configs: > > /etc/rsyslog.d/custom.conf: >> $ModLoad imtcp # load TCP listener >> $DefaultNetstreamDriverCAFile /etc/rsyslog.d/certs/nextdoor/logd-ca.pem >> $DefaultNetstreamDriverCertFile /etc/rsyslog.d/certs/nextdoor/nextdoor.pem >> $DefaultNetstreamDriverKeyFile /etc/rsyslog.d/certs/nextdoor/nextdoor.key >> $DefaultNetstreamDriver gtls >> $InputTCPServerStreamDriverMode 1 >> $InputTCPServerStreamDriverAuthMode x509/certvalid >> $CreateDirs on >> $template >> FILENAME,"/mnt/logs/%fromhost-ip%/%syslogfacility-text%-%$year%-%$month%-%$day%.log" >> *.* ?FILENAME >> # Create an additional socket in postfix's chroot in order not to break >> # mail logging when rsyslog is restarted. If the directory is missing, >> # rsyslog will silently skip creating the socket. >> $AddUnixListenSocket /var/spool/postfix/dev/log > > > /etc/rsyslog.conf >> $ModLoad imuxsock # provides support for local system logging >> $ModLoad imklog # provides kernel logging support (previously done by >> rklogd) >> $ModLoad immark # provides --MARK-- message capability >> $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat >> $EscapeControlCharactersOnReceive off >> $RepeatedMsgReduction off >> $FileOwner sysadm >> $FileGroup adm >> $FileCreateMode 0640 >> $DirOwner sysadm >> $DirGroup adm >> $DirCreateMode 0755 >> $Umask 0022 >> $PreserveFQDN on >> $IncludeConfig /etc/rsyslog.d/*.conf > > >> rsyslogd 5.8.10, compiled with: >> FEATURE_REGEXP: Yes >> FEATURE_LARGEFILE: No >> GSSAPI Kerberos 5 support: Yes >> FEATURE_DEBUG (debug build, slow code): No >> 32bit Atomic operations supported: Yes >> 64bit Atomic operations supported: Yes >> Runtime Instrumentation (slow code): No > > >> 2.6.32-317-ec2 #36-Ubuntu SMP Fri Jul 8 18:12:30 UTC 2011 x86_64 GNU/Linux > > > And here's the log output of the failure.. >> 8901.295674722:7f04c029a700: New connect on NSD 0x1db0ee0 >> . >> >> 8901.359429713:7f04c1a9d700: Message from UNIX socket: #3 >> 8901.359450981:7f04c1a9d700: XXX: pre CM loop, length of control message 32 >> 8901.359461759:7f04c1a9d700: XXX: in CM loop, 1, 2 >> 8901.359471828:7f04c1a9d700: XXX: got credentials pid 24640 >> 8901.359481718:7f04c1a9d700: XXX: post CM loop >> 8901.359493088:7f04c1a9d700: imuxsock: no ratelimiter for pid 24640, >> creating one >> >> Segmentation fault > > The failure looks the same every single time... some message comes in, the > 'CM Loop' comments start, the 'rate limit' comment comes, and then the > segfault. > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards

