To report that coredump and message corruption disappeared after the
application of my patch; I guess it's fixed.

Thanks,
Kaiwang

2012/1/5 Kaiwang Chen <[email protected]>:
> Hello,
>
> I check ChangeLog and found 5.8.4 fixed some bug related to race
> condition as discussed in
> http://bugzilla.adiscon.com/show_bug.cgi?id=275
>
> with patch protecting modification in the getMSGID():
> http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=d66f7b95e081ecb356cc230685c5b7c90f8590b4
>
>
> I guess the situation is almost identical to what in my case:
> corruption is always at the APPNAME field and there is, I believe, a
> typo in line 1955 from getAPPNAME() effectively ruins the mutex
> proctection:
> 1947 /* rgerhards, 2005-11-24
> 1948  */
> 1949 char *getAPPNAME(msg_t *pM, sbool bLockMutex)
> 1950 {
> 1951         uchar *pszRet;
> 1952
> 1953         assert(pM != NULL);
> 1954         if(bLockMutex == LOCK_MUTEX)
> 1955                 MsgUnlock(pM);  /* should be MsgLock(pM) */
> 1956         prepareAPPNAME(pM, MUTEX_ALREADY_LOCKED);
> 1957         if(pM->pCSAPPNAME == NULL)
> 1958                 pszRet = UCHAR_CONSTANT("");
> 1959         else
> 1960                 pszRet = rsCStrGetSzStrNoNULL(pM->pCSAPPNAME);
> 1961         if(bLockMutex == LOCK_MUTEX)
> 1962                 MsgUnlock(pM);
> 1963         return (char*)pszRet;
> 1964 }
>
> Similar cases were spotted in getPROCID(), getStructuredData() and
> getProgramName().  Hopefully the following patch against 5.8.6 tarball
> should solve the problem, I will report back later.
> --- rsyslog-5.8.6/runtime/msg.c.orig    2011-10-21 17:53:02.000000000 +0800
> +++ rsyslog-5.8.6/runtime/msg.c 2012-01-05 21:54:26.594666510 +0800
> @@ -1609,7 +1609,7 @@
>
>        ISOBJ_TYPE_assert(pM, msg);
>        if(bLockMutex == LOCK_MUTEX)
> -               MsgUnlock(pM);
> +               MsgLock(pM);
>        preparePROCID(pM, MUTEX_ALREADY_LOCKED);
>        if(pM->pCSPROCID == NULL)
>                pszRet = UCHAR_CONSTANT("");
> @@ -1846,7 +1846,7 @@
>  {
>        uchar *pszRet;
>
> -       MsgUnlock(pM);
> +       MsgLock(pM);
>        if(pM->pCSStrucData == NULL)
>                pszRet = UCHAR_CONSTANT("-");
>        else
> @@ -1894,7 +1894,7 @@
>        uchar *pszRet;
>
>        if(bLockMutex == LOCK_MUTEX)
> -               MsgUnlock(pM);
> +               MsgLock(pM);
>        prepareProgramName(pM, MUTEX_ALREADY_LOCKED);
>        if(pM->pCSProgName == NULL)
>                pszRet = UCHAR_CONSTANT("");
> @@ -1952,7 +1952,7 @@
>
>        assert(pM != NULL);
>        if(bLockMutex == LOCK_MUTEX)
> -               MsgUnlock(pM);
> +               MsgLock(pM);
>        prepareAPPNAME(pM, MUTEX_ALREADY_LOCKED);
>        if(pM->pCSAPPNAME == NULL)
>                pszRet = UCHAR_CONSTANT("");
>
>
> By the way, many crashes led to the convergence, for example
> //// db1 core.26581 signal 11, Segmentation fault.
> (gdb) p *pThis
> $1 = {
>  pBuf = 0x7f7f5c000cd0 "sshd\177\177",
>  pszBuf = 0x7f7f5c000dd0 "sshd",
>  iBufSize = 4,
>  iStrLen = 4
> }
> (gdb) where
> #0  0x000000000041e6a9 in rsCStrGetSzStr (pThis=0x131cb10) at stringbuf.c:334
> #1  0x000000000041b1bf in getAPPNAME (pM=0x131c140, bLockMutex=1
> '\001') at msg.c:1960
> #2  0x000000000041c0f8 in MsgGetProp (pMsg=0x131c140, pTpe=0x12ef350,
> propID=115 's', pPropLen=0x43fb8f00,
>    pbMustBeFreed=0xfefefefefefefeff) at msg.c:2429
> #3  0x000000000043be88 in tplToString (pTpl=0x12eeca0, pMsg=0x131c140,
> ppBuf=0x12f7910, pLenBuf=0x12f7930) at ../template.c:119
> #4  0x0000000000438498 in prepareBatch (pAction=0x12f72d0,
> pBatch=0x12f7898, pbShutdownImmediate=<value optimized out>)
>    at ../action.c:738
> #5  processBatchMain (pAction=0x12f72d0, pBatch=0x12f7898,
> pbShutdownImmediate=<value optimized out>) at ../action.c:1163
> #6  0x000000000043190d in ConsumerReg (pThis=0x12f73f0,
> pWti=0x12f7870) at queue.c:1705
> #7  0x000000000042b0bd in wtiWorker (pThis=0x12f7870) at wti.c:315
> #8  0x000000000042abd8 in wtpWorker (arg=0x12f7870) at wtp.c:387
> #9  0x00000031d5a06367 in __nptl_setxid () from /lib64/libpthread.so.0
> #10 0x0000000000000000 in ?? ()
>
>
>
> Thanks,
> Kaiwang
>
> 2011/12/31 Kaiwang Chen <[email protected]>:
>> Even worse case
>> <6>1 2011-12-31T12:04:05.036812+08:00 squid6 kernel  - - imklog 5.8.6,
>> log source = /proc/kmsg started.
>> ...several SYN flooding entries in the same second...
>> *** glibc detected *** /sbin/rsyslogd: free(): invalid next size
>> (fast): 0x00007fafb8000e00 ***
>> *** glibc detected *** /sbin/rsyslogd: malloc(): memory corruption:
>> 0x00007fafb8000e20 ***
>>
>>  And possibly hours later the host was not responsive to ssh
>> connection requests, and even TTY login attempts failed...
>> Interestingly the debug log in another file continues a little beyond:
>>
>> Debug line with all properties:
>> FROMHOST: 'squid6', fromhost-ip: '127.0.0.1', HOSTNAME: 'squid6', PRI: 47,
>> syslogtag 'rsyslogd-pstats:', programname: 'rsyslogd-pstats',
>> APP-NAME: 'rsyslogd-pstats', PROCID: '', MSGID: '-',
>> TIMESTAMP: 'Dec 31 12:24:05', STRUCTURED-DATA: '-',
>> msg: 'main Q[DA]: size=0 enqueued=0 full=0 maxqsize=0 '
>> escaped msg: 'main Q[DA]: size=0 enqueued=0 full=0 maxqsize=0 '
>> inputname: impstats rawmsg: 'main Q[DA]: size=0 enqueued=0 full=0 maxqsize=0 
>> '
>>
>> Debug line with all properties:
>> FROMHOST: 'squid6', fromhost-ip: '127.0.0.1', HOSTNAME: 'squid6', PRI: 47,
>> syslogtag 'rsyslogd-pstats:', programname: 'rsyslogd-pstats',
>> APP-NAME: 'rsyslogd-pstats', PROCID: '', MSGID: '-',
>> TIMESTAMP: 'Dec 31 12:24:05', STRUCTURED-DATA: '-',
>> msg: 'main Q: size=4 enqueued=30 full=0 maxqsize=5 '
>> escaped msg: 'main Q: size=4 enqueued=30 full=0 maxqsize=5 '
>> inputname: impstats rawmsg: 'main Q: size=4 enqueued=30 full=0 maxqsize=5 '
>>
>> Fortunately the squid service on that host was not badly affected. So
>> I did graceful failover and reboot the host, everything became OK.
>> Although there was no direct timestamp associated, I guess the problem
>> occured soon after rsyslogd started. I also spotted several corrupted
>> messages on other hosts when rsyslogd started. They are same version:
>>
>> # rsyslogd -v
>> rsyslogd 5.8.6, compiled with:
>>        FEATURE_REGEXP:                         Yes
>>        FEATURE_LARGEFILE:                      No
>>        GSSAPI Kerberos 5 support:              Yes
>>        FEATURE_DEBUG (debug build, slow code): No
>>        32bit Atomic operations supported:      Yes
>>        64bit Atomic operations supported:      Yes
>>        Runtime Instrumentation (slow code):    No
>>
>>
>> Thanks,
>> Kaiwang
>>
>> 2011/12/31 Kaiwang Chen <[email protected]>:
>>> Also spotted corrupted messages from imuxsock:
>>>
>>> Debug line with all properties:
>>> FROMHOST: 'squid9', fromhost-ip: '127.0.0.1', HOSTNAME: 'squid9', PRI: 47,
>>> syslogtag 'rsyslogd-pstats:', programname: '<88>+?<99>2
>>> <88>+?<99>rssyslogd-pstats', APP-NAME: '', PROCID: '', MSGID: '-',
>>> TIMESTAMP: 'Dec 30 21:46:14', STRUCTURED-DATA: '-',
>>> msg: 'imuxsock: submitted=429 ratelimit.discarded=0
>>> ratelimit.numratelimiters=213 '
>>> escaped msg: 'imuxsock: submitted=429 ratelimit.discarded=0
>>> ratelimit.numratelimiters=213 '
>>> inputname: impstats rawmsg: 'imuxsock: submitted=429
>>> ratelimit.discarded=0 ratelimit.numratelimiters=213 '
>>>
>>>
>>> Thanks,
>>> Kaiwang
>>>
>>> 2011/12/27 Kaiwang Chen <[email protected]>:
>>>> 2011/12/25  <[email protected]>:
>>>>> On Fri, 23 Dec 2011, Kaiwang Chen wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I found rsyslogd occasionally produced corrupted log entries like
>>>>>>
>>>>>> <6>1 2011-12-23T23:03:18.089938+08:00 db1 <D0>#001 D^kernel  - -
>>>>>> device eth0 entered promiscuous mode
>>>>>>
>>>>>> I believe that problem appeared in earlier versions including 5.8.2.
>>>>>> Looks like corruption never occur before hostname field. Is it a
>>>>>> reported bug? Any clue?
>>>>>
>>>>>
>>>>> the question is if this problem is in the raw message being sent to you, 
>>>>> or
>>>>> is it something added by the rsyslog processing.
>>>>>
>>>>> I would suggest adding something along the following line.
>>>>>
>>>>> :rawmesg, :contains, "#001"  /var/log/badmessages:RSYSLOG_DEBUG
>>>>>
>>>>> this will look for any messages with the hex 01 character in them and spit
>>>>> out all the info that you can use about the message in to the file
>>>>> /var/log/badmessages
>>>>>
>>>>> run this for a little while and look to see what the raw message that is
>>>>> being received over the wire looks like. If it's bad, then you need to 
>>>>> look
>>>>> at the sender. If the raw message looks sane, but rsyslog isn't handling 
>>>>> it
>>>>> right, then we can troubleshoot from there.
>>>>
>>>> Looks like it's related to tcp transmission. On the original server
>>>> that get input from unix socket, it is OK; while on the centeral log
>>>> server, the raw message is already corrupted.
>>>>
>>>> ===== on central log server
>>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - -  Connection from
>>>> UDP: [172.25.0.230]:53547
>>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - -  Received SNMP
>>>> packet(s) from UDP: [172.25.0.230]:53547
>>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 2#032#025<D5>1 - -
>>>> Connection from UDP: [172.25.0.230]:53547
>>>>
>>>> Debug line with all properties:
>>>> FROMHOST: '172.25.0.71', fromhost-ip: '172.25.0.71', HOSTNAME: 'gw71', 
>>>> PRI: 30,
>>>> syslogtag 'snmpd[24254]', programname: 'snmpd', APP-NAME: 'snmpd',
>>>> PROCID: '24254', MSGID: '-',
>>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-',
>>>> msg: ' Connection from UDP: [172.25.0.230]:53547'
>>>> escaped msg: ' Connection from UDP: [172.25.0.230]:53547'
>>>> inputname: imptcp rawmsg: '<30>1 2011-12-27T14:06:56+08:00 gw71 snmpd
>>>> 24254 - -  Connection from UDP: [172.25.0.230]:53547'
>>>>
>>>> Debug line with all properties:
>>>> FROMHOST: '172.25.0.71', fromhost-ip: '172.25.0.71', HOSTNAME: 'gw71', 
>>>> PRI: 30,
>>>> syslogtag 'snmpd[24254]', programname: 'snmpd', APP-NAME: 'snmpd',
>>>> PROCID: '24254', MSGID: '-',
>>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-',
>>>> msg: ' Received SNMP packet(s) from UDP: [172.25.0.230]:53547'
>>>> escaped msg: ' Received SNMP packet(s) from UDP: [172.25.0.230]:53547'
>>>> inputname: imptcp rawmsg: '<30>1 2011-12-27T14:06:56+08:00 gw71 snmpd
>>>> 24254 - -  Received SNMP packet(s) from UDP: [172.25.0.230]:53547'
>>>>
>>>> Debug line with all properties:
>>>> FROMHOST: '172.25.0.71', fromhost-ip: '172.25.0.71', HOSTNAME: 'gw71', 
>>>> PRI: 30,
>>>> syslogtag 'snmpd[2#032#025<D5>1]', programname: 'snmpd', APP-NAME:
>>>> 'snmpd', PROCID: '2#032#025<D5>1', MSGID: '-',
>>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-',
>>>> msg: ' Connection from UDP: [172.25.0.230]:53547'
>>>> escaped msg: ' Connection from UDP: [172.25.0.230]:53547'
>>>> inputname: imptcp rawmsg: '<30>1 2011-12-27T14:06:56+08:00 gw71 snmpd
>>>> 2#032#025<D5>1 - -  Connection from UDP: [172.25.0.230]:53547'
>>>>
>>>> ===== on the orginal host  (*.*       @@(o)172.25.0.230:514)
>>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - -  Connection from
>>>> UDP: [172.25.0.230]:53547
>>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - -  Received SNMP
>>>> packet(s) from UDP: [172.25.0.230]:53547
>>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - -  Connection from
>>>> UDP: [172.25.0.230]:53547
>>>>
>>>> Debug line with all properties:
>>>> FROMHOST: 'gw71', fromhost-ip: '127.0.0.1', HOSTNAME: 'gw71', PRI: 30,
>>>> syslogtag 'snmpd[24254]:', programname: 'snmpd', APP-NAME: 'snmpd',
>>>> PROCID: '24254', MSGID: '-',
>>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-',
>>>> msg: ' Connection from UDP: [172.25.0.230]:53547'
>>>> escaped msg: ' Connection from UDP: [172.25.0.230]:53547'
>>>> inputname: imuxsock rawmsg: '<30>Dec 27 14:06:56 snmpd[24254]:
>>>> Connection from UDP: [172.25.0.230]:53547'
>>>>
>>>> Debug line with all properties:
>>>> FROMHOST: 'gw71', fromhost-ip: '127.0.0.1', HOSTNAME: 'gw71', PRI: 30,
>>>> syslogtag 'snmpd[24254]:', programname: 'snmpd', APP-NAME: 'snmpd',
>>>> PROCID: '24254', MSGID: '-',
>>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-',
>>>> msg: ' Received SNMP packet(s) from UDP: [172.25.0.230]:53547'
>>>> escaped msg: ' Received SNMP packet(s) from UDP: [172.25.0.230]:53547'
>>>> inputname: imuxsock rawmsg: '<30>Dec 27 14:06:56 snmpd[24254]:
>>>> Received SNMP packet(s) from UDP: [172.25.0.230]:53547'
>>>>
>>>> Debug line with all properties:
>>>> FROMHOST: 'gw71', fromhost-ip: '127.0.0.1', HOSTNAME: 'gw71', PRI: 30,
>>>> syslogtag 'snmpd[24254]:', programname: 'snmpd', APP-NAME: 'snmpd',
>>>> PROCID: '24254', MSGID: '-',
>>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-',
>>>> msg: ' Connection from UDP: [172.25.0.230]:53547'
>>>> escaped msg: ' Connection from UDP: [172.25.0.230]:53547'
>>>> inputname: imuxsock rawmsg: '<30>Dec 27 14:06:56 snmpd[24254]:
>>>> Connection from UDP: [172.25.0.230]:53547'
>>>>
>>>>
>>>> Thanks,
>>>> Kaiwang
>>>>>
>>>>> David Lang
>>>>>
>>>>>  > rsyslogd 5.8.6, compiled with:
>>>>>>
>>>>>>       FEATURE_REGEXP:                         Yes
>>>>>>       FEATURE_LARGEFILE:                      No
>>>>>>       GSSAPI Kerberos 5 support:              Yes
>>>>>>       FEATURE_DEBUG (debug build, slow code): No
>>>>>>       32bit Atomic operations supported:      Yes
>>>>>>       64bit Atomic operations supported:      Yes
>>>>>>       Runtime Instrumentation (slow code):    No
>>>>>>
>>>>>> /etc/rsyslog.conf
>>>>>> $ActionFileDefaultTemplate RSYSLOG_SyslogProtocol23Format
>>>>>> $ActionForwardDefaultTemplate RSYSLOG_SyslogProtocol23Format
>>>>>>
>>>>>> $ModLoad imklog
>>>>>> $ModLoad imuxsock
>>>>>> $ModLoad impstats
>>>>>>
>>>>>> $SystemLogSocketIgnoreMsgTimestamp  off
>>>>>> $SystemLogUsePIDFromSystem on
>>>>>>
>>>>>> $PStatInterval 600
>>>>>> $PStatSeverity 7
>>>>>>
>>>>>> $WorkDirectory /var/spool/rsyslog
>>>>>>
>>>>>> $MainMsgQueueSaveOnShutdown on
>>>>>> $MainMsgQueueFileName mq
>>>>>> $MainMsgQueueMaxFileSize 5m
>>>>>>
>>>>>> $ActionQueueType LinkedList
>>>>>> $ActionQueueSaveOnShutdown on
>>>>>> $ActionQueueFileName dbq
>>>>>> $ActionQueueMaxFileSize 10m
>>>>>> $ActionResumeRetryCount -1
>>>>>> *.*       @@(o)10.2.3.4
>>>>>>
>>>>>>
>>>>>> # Log all kernel messages to the console.
>>>>>> # Logging much else clutters up the screen.
>>>>>> #kern.*                                                 /dev/console
>>>>>>
>>>>>> # Log anything (except mail) of level info or higher.
>>>>>> # Don't log private authentication messages!
>>>>>> *.info;mail.none;authpriv.none;cron.none                /var/log/messages
>>>>>>
>>>>>> # The authpriv file has restricted access.
>>>>>> authpriv.*                                              /var/log/secure
>>>>>>
>>>>>> # Log all the mail messages in one place.
>>>>>> mail.*                                                  -/var/log/maillog
>>>>>>
>>>>>>
>>>>>> # Log cron stuff
>>>>>> cron.*                                                  /var/log/cron
>>>>>>
>>>>>> # Everybody gets emergency messages
>>>>>> *.emerg                                                 *
>>>>>>
>>>>>> # Save news errors of level crit and higher in a special file.
>>>>>> uucp,news.crit                                          /var/log/spooler
>>>>>>
>>>>>> # Save boot messages also to boot.log
>>>>>> local7.*                                                /var/log/boot.log
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Kaiwang
>>>>>> _______________________________________________
>>>>>> rsyslog mailing list
>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>
>>>>> _______________________________________________
>>>>> rsyslog mailing list
>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> http://www.rsyslog.com/professional-services/
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/

Reply via email to