To report that coredump and message corruption disappeared after the application of my patch; I guess it's fixed.
Thanks, Kaiwang 2012/1/5 Kaiwang Chen <[email protected]>: > Hello, > > I check ChangeLog and found 5.8.4 fixed some bug related to race > condition as discussed in > http://bugzilla.adiscon.com/show_bug.cgi?id=275 > > with patch protecting modification in the getMSGID(): > http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=d66f7b95e081ecb356cc230685c5b7c90f8590b4 > > > I guess the situation is almost identical to what in my case: > corruption is always at the APPNAME field and there is, I believe, a > typo in line 1955 from getAPPNAME() effectively ruins the mutex > proctection: > 1947 /* rgerhards, 2005-11-24 > 1948 */ > 1949 char *getAPPNAME(msg_t *pM, sbool bLockMutex) > 1950 { > 1951 uchar *pszRet; > 1952 > 1953 assert(pM != NULL); > 1954 if(bLockMutex == LOCK_MUTEX) > 1955 MsgUnlock(pM); /* should be MsgLock(pM) */ > 1956 prepareAPPNAME(pM, MUTEX_ALREADY_LOCKED); > 1957 if(pM->pCSAPPNAME == NULL) > 1958 pszRet = UCHAR_CONSTANT(""); > 1959 else > 1960 pszRet = rsCStrGetSzStrNoNULL(pM->pCSAPPNAME); > 1961 if(bLockMutex == LOCK_MUTEX) > 1962 MsgUnlock(pM); > 1963 return (char*)pszRet; > 1964 } > > Similar cases were spotted in getPROCID(), getStructuredData() and > getProgramName(). Hopefully the following patch against 5.8.6 tarball > should solve the problem, I will report back later. > --- rsyslog-5.8.6/runtime/msg.c.orig 2011-10-21 17:53:02.000000000 +0800 > +++ rsyslog-5.8.6/runtime/msg.c 2012-01-05 21:54:26.594666510 +0800 > @@ -1609,7 +1609,7 @@ > > ISOBJ_TYPE_assert(pM, msg); > if(bLockMutex == LOCK_MUTEX) > - MsgUnlock(pM); > + MsgLock(pM); > preparePROCID(pM, MUTEX_ALREADY_LOCKED); > if(pM->pCSPROCID == NULL) > pszRet = UCHAR_CONSTANT(""); > @@ -1846,7 +1846,7 @@ > { > uchar *pszRet; > > - MsgUnlock(pM); > + MsgLock(pM); > if(pM->pCSStrucData == NULL) > pszRet = UCHAR_CONSTANT("-"); > else > @@ -1894,7 +1894,7 @@ > uchar *pszRet; > > if(bLockMutex == LOCK_MUTEX) > - MsgUnlock(pM); > + MsgLock(pM); > prepareProgramName(pM, MUTEX_ALREADY_LOCKED); > if(pM->pCSProgName == NULL) > pszRet = UCHAR_CONSTANT(""); > @@ -1952,7 +1952,7 @@ > > assert(pM != NULL); > if(bLockMutex == LOCK_MUTEX) > - MsgUnlock(pM); > + MsgLock(pM); > prepareAPPNAME(pM, MUTEX_ALREADY_LOCKED); > if(pM->pCSAPPNAME == NULL) > pszRet = UCHAR_CONSTANT(""); > > > By the way, many crashes led to the convergence, for example > //// db1 core.26581 signal 11, Segmentation fault. > (gdb) p *pThis > $1 = { > pBuf = 0x7f7f5c000cd0 "sshd\177\177", > pszBuf = 0x7f7f5c000dd0 "sshd", > iBufSize = 4, > iStrLen = 4 > } > (gdb) where > #0 0x000000000041e6a9 in rsCStrGetSzStr (pThis=0x131cb10) at stringbuf.c:334 > #1 0x000000000041b1bf in getAPPNAME (pM=0x131c140, bLockMutex=1 > '\001') at msg.c:1960 > #2 0x000000000041c0f8 in MsgGetProp (pMsg=0x131c140, pTpe=0x12ef350, > propID=115 's', pPropLen=0x43fb8f00, > pbMustBeFreed=0xfefefefefefefeff) at msg.c:2429 > #3 0x000000000043be88 in tplToString (pTpl=0x12eeca0, pMsg=0x131c140, > ppBuf=0x12f7910, pLenBuf=0x12f7930) at ../template.c:119 > #4 0x0000000000438498 in prepareBatch (pAction=0x12f72d0, > pBatch=0x12f7898, pbShutdownImmediate=<value optimized out>) > at ../action.c:738 > #5 processBatchMain (pAction=0x12f72d0, pBatch=0x12f7898, > pbShutdownImmediate=<value optimized out>) at ../action.c:1163 > #6 0x000000000043190d in ConsumerReg (pThis=0x12f73f0, > pWti=0x12f7870) at queue.c:1705 > #7 0x000000000042b0bd in wtiWorker (pThis=0x12f7870) at wti.c:315 > #8 0x000000000042abd8 in wtpWorker (arg=0x12f7870) at wtp.c:387 > #9 0x00000031d5a06367 in __nptl_setxid () from /lib64/libpthread.so.0 > #10 0x0000000000000000 in ?? () > > > > Thanks, > Kaiwang > > 2011/12/31 Kaiwang Chen <[email protected]>: >> Even worse case >> <6>1 2011-12-31T12:04:05.036812+08:00 squid6 kernel - - imklog 5.8.6, >> log source = /proc/kmsg started. >> ...several SYN flooding entries in the same second... >> *** glibc detected *** /sbin/rsyslogd: free(): invalid next size >> (fast): 0x00007fafb8000e00 *** >> *** glibc detected *** /sbin/rsyslogd: malloc(): memory corruption: >> 0x00007fafb8000e20 *** >> >> And possibly hours later the host was not responsive to ssh >> connection requests, and even TTY login attempts failed... >> Interestingly the debug log in another file continues a little beyond: >> >> Debug line with all properties: >> FROMHOST: 'squid6', fromhost-ip: '127.0.0.1', HOSTNAME: 'squid6', PRI: 47, >> syslogtag 'rsyslogd-pstats:', programname: 'rsyslogd-pstats', >> APP-NAME: 'rsyslogd-pstats', PROCID: '', MSGID: '-', >> TIMESTAMP: 'Dec 31 12:24:05', STRUCTURED-DATA: '-', >> msg: 'main Q[DA]: size=0 enqueued=0 full=0 maxqsize=0 ' >> escaped msg: 'main Q[DA]: size=0 enqueued=0 full=0 maxqsize=0 ' >> inputname: impstats rawmsg: 'main Q[DA]: size=0 enqueued=0 full=0 maxqsize=0 >> ' >> >> Debug line with all properties: >> FROMHOST: 'squid6', fromhost-ip: '127.0.0.1', HOSTNAME: 'squid6', PRI: 47, >> syslogtag 'rsyslogd-pstats:', programname: 'rsyslogd-pstats', >> APP-NAME: 'rsyslogd-pstats', PROCID: '', MSGID: '-', >> TIMESTAMP: 'Dec 31 12:24:05', STRUCTURED-DATA: '-', >> msg: 'main Q: size=4 enqueued=30 full=0 maxqsize=5 ' >> escaped msg: 'main Q: size=4 enqueued=30 full=0 maxqsize=5 ' >> inputname: impstats rawmsg: 'main Q: size=4 enqueued=30 full=0 maxqsize=5 ' >> >> Fortunately the squid service on that host was not badly affected. So >> I did graceful failover and reboot the host, everything became OK. >> Although there was no direct timestamp associated, I guess the problem >> occured soon after rsyslogd started. I also spotted several corrupted >> messages on other hosts when rsyslogd started. They are same version: >> >> # rsyslogd -v >> rsyslogd 5.8.6, compiled with: >> FEATURE_REGEXP: Yes >> FEATURE_LARGEFILE: No >> GSSAPI Kerberos 5 support: Yes >> FEATURE_DEBUG (debug build, slow code): No >> 32bit Atomic operations supported: Yes >> 64bit Atomic operations supported: Yes >> Runtime Instrumentation (slow code): No >> >> >> Thanks, >> Kaiwang >> >> 2011/12/31 Kaiwang Chen <[email protected]>: >>> Also spotted corrupted messages from imuxsock: >>> >>> Debug line with all properties: >>> FROMHOST: 'squid9', fromhost-ip: '127.0.0.1', HOSTNAME: 'squid9', PRI: 47, >>> syslogtag 'rsyslogd-pstats:', programname: '<88>+?<99>2 >>> <88>+?<99>rssyslogd-pstats', APP-NAME: '', PROCID: '', MSGID: '-', >>> TIMESTAMP: 'Dec 30 21:46:14', STRUCTURED-DATA: '-', >>> msg: 'imuxsock: submitted=429 ratelimit.discarded=0 >>> ratelimit.numratelimiters=213 ' >>> escaped msg: 'imuxsock: submitted=429 ratelimit.discarded=0 >>> ratelimit.numratelimiters=213 ' >>> inputname: impstats rawmsg: 'imuxsock: submitted=429 >>> ratelimit.discarded=0 ratelimit.numratelimiters=213 ' >>> >>> >>> Thanks, >>> Kaiwang >>> >>> 2011/12/27 Kaiwang Chen <[email protected]>: >>>> 2011/12/25 <[email protected]>: >>>>> On Fri, 23 Dec 2011, Kaiwang Chen wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I found rsyslogd occasionally produced corrupted log entries like >>>>>> >>>>>> <6>1 2011-12-23T23:03:18.089938+08:00 db1 <D0>#001 D^kernel - - >>>>>> device eth0 entered promiscuous mode >>>>>> >>>>>> I believe that problem appeared in earlier versions including 5.8.2. >>>>>> Looks like corruption never occur before hostname field. Is it a >>>>>> reported bug? Any clue? >>>>> >>>>> >>>>> the question is if this problem is in the raw message being sent to you, >>>>> or >>>>> is it something added by the rsyslog processing. >>>>> >>>>> I would suggest adding something along the following line. >>>>> >>>>> :rawmesg, :contains, "#001" /var/log/badmessages:RSYSLOG_DEBUG >>>>> >>>>> this will look for any messages with the hex 01 character in them and spit >>>>> out all the info that you can use about the message in to the file >>>>> /var/log/badmessages >>>>> >>>>> run this for a little while and look to see what the raw message that is >>>>> being received over the wire looks like. If it's bad, then you need to >>>>> look >>>>> at the sender. If the raw message looks sane, but rsyslog isn't handling >>>>> it >>>>> right, then we can troubleshoot from there. >>>> >>>> Looks like it's related to tcp transmission. On the original server >>>> that get input from unix socket, it is OK; while on the centeral log >>>> server, the raw message is already corrupted. >>>> >>>> ===== on central log server >>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - - Connection from >>>> UDP: [172.25.0.230]:53547 >>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - - Received SNMP >>>> packet(s) from UDP: [172.25.0.230]:53547 >>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 2#032#025<D5>1 - - >>>> Connection from UDP: [172.25.0.230]:53547 >>>> >>>> Debug line with all properties: >>>> FROMHOST: '172.25.0.71', fromhost-ip: '172.25.0.71', HOSTNAME: 'gw71', >>>> PRI: 30, >>>> syslogtag 'snmpd[24254]', programname: 'snmpd', APP-NAME: 'snmpd', >>>> PROCID: '24254', MSGID: '-', >>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-', >>>> msg: ' Connection from UDP: [172.25.0.230]:53547' >>>> escaped msg: ' Connection from UDP: [172.25.0.230]:53547' >>>> inputname: imptcp rawmsg: '<30>1 2011-12-27T14:06:56+08:00 gw71 snmpd >>>> 24254 - - Connection from UDP: [172.25.0.230]:53547' >>>> >>>> Debug line with all properties: >>>> FROMHOST: '172.25.0.71', fromhost-ip: '172.25.0.71', HOSTNAME: 'gw71', >>>> PRI: 30, >>>> syslogtag 'snmpd[24254]', programname: 'snmpd', APP-NAME: 'snmpd', >>>> PROCID: '24254', MSGID: '-', >>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-', >>>> msg: ' Received SNMP packet(s) from UDP: [172.25.0.230]:53547' >>>> escaped msg: ' Received SNMP packet(s) from UDP: [172.25.0.230]:53547' >>>> inputname: imptcp rawmsg: '<30>1 2011-12-27T14:06:56+08:00 gw71 snmpd >>>> 24254 - - Received SNMP packet(s) from UDP: [172.25.0.230]:53547' >>>> >>>> Debug line with all properties: >>>> FROMHOST: '172.25.0.71', fromhost-ip: '172.25.0.71', HOSTNAME: 'gw71', >>>> PRI: 30, >>>> syslogtag 'snmpd[2#032#025<D5>1]', programname: 'snmpd', APP-NAME: >>>> 'snmpd', PROCID: '2#032#025<D5>1', MSGID: '-', >>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-', >>>> msg: ' Connection from UDP: [172.25.0.230]:53547' >>>> escaped msg: ' Connection from UDP: [172.25.0.230]:53547' >>>> inputname: imptcp rawmsg: '<30>1 2011-12-27T14:06:56+08:00 gw71 snmpd >>>> 2#032#025<D5>1 - - Connection from UDP: [172.25.0.230]:53547' >>>> >>>> ===== on the orginal host (*.* @@(o)172.25.0.230:514) >>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - - Connection from >>>> UDP: [172.25.0.230]:53547 >>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - - Received SNMP >>>> packet(s) from UDP: [172.25.0.230]:53547 >>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - - Connection from >>>> UDP: [172.25.0.230]:53547 >>>> >>>> Debug line with all properties: >>>> FROMHOST: 'gw71', fromhost-ip: '127.0.0.1', HOSTNAME: 'gw71', PRI: 30, >>>> syslogtag 'snmpd[24254]:', programname: 'snmpd', APP-NAME: 'snmpd', >>>> PROCID: '24254', MSGID: '-', >>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-', >>>> msg: ' Connection from UDP: [172.25.0.230]:53547' >>>> escaped msg: ' Connection from UDP: [172.25.0.230]:53547' >>>> inputname: imuxsock rawmsg: '<30>Dec 27 14:06:56 snmpd[24254]: >>>> Connection from UDP: [172.25.0.230]:53547' >>>> >>>> Debug line with all properties: >>>> FROMHOST: 'gw71', fromhost-ip: '127.0.0.1', HOSTNAME: 'gw71', PRI: 30, >>>> syslogtag 'snmpd[24254]:', programname: 'snmpd', APP-NAME: 'snmpd', >>>> PROCID: '24254', MSGID: '-', >>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-', >>>> msg: ' Received SNMP packet(s) from UDP: [172.25.0.230]:53547' >>>> escaped msg: ' Received SNMP packet(s) from UDP: [172.25.0.230]:53547' >>>> inputname: imuxsock rawmsg: '<30>Dec 27 14:06:56 snmpd[24254]: >>>> Received SNMP packet(s) from UDP: [172.25.0.230]:53547' >>>> >>>> Debug line with all properties: >>>> FROMHOST: 'gw71', fromhost-ip: '127.0.0.1', HOSTNAME: 'gw71', PRI: 30, >>>> syslogtag 'snmpd[24254]:', programname: 'snmpd', APP-NAME: 'snmpd', >>>> PROCID: '24254', MSGID: '-', >>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-', >>>> msg: ' Connection from UDP: [172.25.0.230]:53547' >>>> escaped msg: ' Connection from UDP: [172.25.0.230]:53547' >>>> inputname: imuxsock rawmsg: '<30>Dec 27 14:06:56 snmpd[24254]: >>>> Connection from UDP: [172.25.0.230]:53547' >>>> >>>> >>>> Thanks, >>>> Kaiwang >>>>> >>>>> David Lang >>>>> >>>>> > rsyslogd 5.8.6, compiled with: >>>>>> >>>>>> FEATURE_REGEXP: Yes >>>>>> FEATURE_LARGEFILE: No >>>>>> GSSAPI Kerberos 5 support: Yes >>>>>> FEATURE_DEBUG (debug build, slow code): No >>>>>> 32bit Atomic operations supported: Yes >>>>>> 64bit Atomic operations supported: Yes >>>>>> Runtime Instrumentation (slow code): No >>>>>> >>>>>> /etc/rsyslog.conf >>>>>> $ActionFileDefaultTemplate RSYSLOG_SyslogProtocol23Format >>>>>> $ActionForwardDefaultTemplate RSYSLOG_SyslogProtocol23Format >>>>>> >>>>>> $ModLoad imklog >>>>>> $ModLoad imuxsock >>>>>> $ModLoad impstats >>>>>> >>>>>> $SystemLogSocketIgnoreMsgTimestamp off >>>>>> $SystemLogUsePIDFromSystem on >>>>>> >>>>>> $PStatInterval 600 >>>>>> $PStatSeverity 7 >>>>>> >>>>>> $WorkDirectory /var/spool/rsyslog >>>>>> >>>>>> $MainMsgQueueSaveOnShutdown on >>>>>> $MainMsgQueueFileName mq >>>>>> $MainMsgQueueMaxFileSize 5m >>>>>> >>>>>> $ActionQueueType LinkedList >>>>>> $ActionQueueSaveOnShutdown on >>>>>> $ActionQueueFileName dbq >>>>>> $ActionQueueMaxFileSize 10m >>>>>> $ActionResumeRetryCount -1 >>>>>> *.* @@(o)10.2.3.4 >>>>>> >>>>>> >>>>>> # Log all kernel messages to the console. >>>>>> # Logging much else clutters up the screen. >>>>>> #kern.* /dev/console >>>>>> >>>>>> # Log anything (except mail) of level info or higher. >>>>>> # Don't log private authentication messages! >>>>>> *.info;mail.none;authpriv.none;cron.none /var/log/messages >>>>>> >>>>>> # The authpriv file has restricted access. >>>>>> authpriv.* /var/log/secure >>>>>> >>>>>> # Log all the mail messages in one place. >>>>>> mail.* -/var/log/maillog >>>>>> >>>>>> >>>>>> # Log cron stuff >>>>>> cron.* /var/log/cron >>>>>> >>>>>> # Everybody gets emergency messages >>>>>> *.emerg * >>>>>> >>>>>> # Save news errors of level crit and higher in a special file. >>>>>> uucp,news.crit /var/log/spooler >>>>>> >>>>>> # Save boot messages also to boot.log >>>>>> local7.* /var/log/boot.log >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Kaiwang >>>>>> _______________________________________________ >>>>>> rsyslog mailing list >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>> http://www.rsyslog.com/professional-services/ >>>>>> >>>>> _______________________________________________ >>>>> rsyslog mailing list >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> http://www.rsyslog.com/professional-services/ _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/

