I've seen a moderate amount of crashes in this area myself.  I think
this is one of those long-standing bugs that is hiding somewhere -
it's visible in 4.6/4.7 as well, and I don't think its necessarily an
issue tied to RHEL.

-Aaron

2011/2/17 Rainer Gerhards <[email protected]>:
> Hi Dražen,
>
> I dug this problem report out and tried to reproduce. I both tried my usual
> platform under Fedora as well as CentOS 5.5 (64 bit). Unfortunately I did not
> run into any trouble. Of course, I do not have the program you use, so this
> may be a difference. I tested with a small program that just read stdin and
> threw everything it read away. I also tested with writing to a file instead
> of omprog. I tested on a quad core system and sent 10 million messages.
>
> Can you confirm that you still have some trouble with this scenario?
>
> Rainer
>
>> -----Original Message-----
>> From: [email protected] [mailto:rsyslog-
>> [email protected]] On Behalf Of Dražen Kacar
>> Sent: Friday, November 12, 2010 1:38 PM
>> To: [email protected]
>> Subject: [rsyslog] SIGSEGV because of double free in msgDestruct
>>
>> Hello.
>>
>> I have rsyslog 5.6.0 on CentOS 5.5 with a slightly complex
>> configuration
>> and it's crashing. The complete configuration file is attached. The
>> crash
>> is perfectly reproducible and it happens very soon after the data
>> starts
>> arriving. The program was started with:
>>
>> rsyslogd -c5 -x -f rsyslog-datasink.conf
>>
>> I have two queues in order to have two thread pools. Input queue just
>> takes the message from UDP or TCP socket and uses omruleset to pass it
>> to
>> the output queue. The output queue then uses omprog to pass the message
>> to
>> the external program. Omprog blocks when the pipe to the external
>> program
>> is full, so I wanted to have unblocked threads to accept incoming
>> messages
>> (which will mostly use UDP). Hence, the configuration has two queues.
>>
>> It's possible that I made some error in the configuration and rsyslogd
>> is
>> crashing because I'm doing something that I wasn't supposed to do, but
>> it
>> didn't detect the faulty configuration early on.
>>
>> The whole thing works fine when I have only one queue (created with
>> $Ruleset) and input and omprog modules on it. But I'd really like to
>> use
>> two thread pools. It should be possible to reproduce this problem with
>> cat
>> as the omprog binary, although I haven't tried.
>>
>> One curiosity (probably unrelated to the problem): $GenerateConfigGraph
>> at
>> the end of the config file creates a picture which has only the main
>> queue, but the queues I configured with $Ruleset directives are not on
>> the
>> picture.
>>
>> The below is from gdb. The process was started from gdb, so there's no
>> call to sigsegvHdlr(), which can be seen in the core file when I start
>> rsyslogd on its own.
>>
>> (gdb) info threads
>> * 8 Thread 0xb4debb90 (LWP 11149)  ConsumerReg (pThis=0x80b7988,
>>     pWti=0x80b7cb8) at queue.c:1679
>>   7 Thread 0xb57ecb90 (LWP 11148)  0x00d46402 in __kernel_vsyscall ()
>>   6 Thread 0xb61edb90 (LWP 11147)  msgDestruct (ppThis=0xb61ed1d4) at
>> msg.c:790
>>   5 Thread 0xb6beeb90 (LWP 11146)  0x00d46402 in __kernel_vsyscall ()
>>   4 Thread 0xb75efb90 (LWP 11145)  0x00d46402 in __kernel_vsyscall ()
>>   3 Thread 0xb7ff0b90 (LWP 11144)  0x00d46402 in __kernel_vsyscall ()
>>   2 Thread 0xb7ff1ac0 (LWP 11111)  0x00d46402 in __kernel_vsyscall ()
>> (gdb) bt
>> #0  0x00d46402 in __kernel_vsyscall ()
>> #1  0x00b2f040 in raise () from /lib/i686/nosegneg/libc.so.6
>> #2  0x00b30a21 in abort () from /lib/i686/nosegneg/libc.so.6
>> #3  0x00b67e3b in __libc_message () from /lib/i686/nosegneg/libc.so.6
>> #4  0x00b70758 in free () from /lib/i686/nosegneg/libc.so.6
>> #5  0x080612ee in msgDestruct (ppThis=0xb4deb1d4) at msg.c:816
>> #6  0x08079e35 in DeleteProcessedBatch (pThis=0x80b7988,
>> pBatch=0x80b7cd0)
>>     at queue.c:1404
>> #7  0x0807a3b9 in DequeueConsumableElements (pThis=0x80b7988,
>> pWti=0x80b7cb8)
>>     at queue.c:1441
>> #8  DequeueConsumable (pThis=0x80b7988, pWti=0x80b7cb8) at queue.c:1489
>> #9  0x0807a5d7 in DequeueForConsumer (pThis=0x80b7988, pWti=0x80b7cb8)
>>     at queue.c:1626
>> #10 ConsumerReg (pThis=0x80b7988, pWti=0x80b7cb8) at queue.c:1679
>> #11 0x0807350e in wtiWorker (pThis=0x80b7cb8) at wti.c:315
>> #12 0x08072e1f in wtpWorker (arg=0x80b7cb8) at wtp.c:381
>> #13 0x00c9b869 in start_thread () from
>> /lib/i686/nosegneg/libpthread.so.0
>> #14 0x00bd9e9e in clone () from /lib/i686/nosegneg/libc.so.6
>>
>> The crash happens in msgDestruct() when it tries to free
>> pThis->rcvFrom.pfrominet. Valgrind says it's a double free problem.
>>
>> The queue mutex used by DequeueForConsumer seems to be properly locked
>> my
>> thread 8. From stack frame 10:
>>
>> (gdb) p *pThis->mut
>> $261 = {__data = {__lock = 2, __count = 0, __owner = 11149, __kind = 0,
>>     __nusers = 1, {__spins = 0, __list = {__next = 0x0}}},
>>   __size =
>> "\002\000\000\000\000\000\000\000\215+\000\000\000\000\000\000\001\000\
>> 000\000\000\000\000", __align = 2}
>>
>> The value for __lock is curious. It's usually 1 for locked or 0 for
>> unlocked, but it might have something to do with gdb. It's 1 in the
>> core
>> files. pThis->mutThrdMgmt is unlocked.
>>
>> I've checked omruleset code and it does a proper deep copy, as far as I
>> can tell. All the code in msg.c also seems fine. So I don't know what's
>> happening.
>>
>> --
>>  .-.   .-.    Yes, I am an agent of Satan, but my duties are largely
>> (_  \ /  _)   ceremonial.
>>      |
>>      |        [email protected]
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to