I've seen a moderate amount of crashes in this area myself. I think this is one of those long-standing bugs that is hiding somewhere - it's visible in 4.6/4.7 as well, and I don't think its necessarily an issue tied to RHEL.
-Aaron 2011/2/17 Rainer Gerhards <[email protected]>: > Hi Dražen, > > I dug this problem report out and tried to reproduce. I both tried my usual > platform under Fedora as well as CentOS 5.5 (64 bit). Unfortunately I did not > run into any trouble. Of course, I do not have the program you use, so this > may be a difference. I tested with a small program that just read stdin and > threw everything it read away. I also tested with writing to a file instead > of omprog. I tested on a quad core system and sent 10 million messages. > > Can you confirm that you still have some trouble with this scenario? > > Rainer > >> -----Original Message----- >> From: [email protected] [mailto:rsyslog- >> [email protected]] On Behalf Of Dražen Kacar >> Sent: Friday, November 12, 2010 1:38 PM >> To: [email protected] >> Subject: [rsyslog] SIGSEGV because of double free in msgDestruct >> >> Hello. >> >> I have rsyslog 5.6.0 on CentOS 5.5 with a slightly complex >> configuration >> and it's crashing. The complete configuration file is attached. The >> crash >> is perfectly reproducible and it happens very soon after the data >> starts >> arriving. The program was started with: >> >> rsyslogd -c5 -x -f rsyslog-datasink.conf >> >> I have two queues in order to have two thread pools. Input queue just >> takes the message from UDP or TCP socket and uses omruleset to pass it >> to >> the output queue. The output queue then uses omprog to pass the message >> to >> the external program. Omprog blocks when the pipe to the external >> program >> is full, so I wanted to have unblocked threads to accept incoming >> messages >> (which will mostly use UDP). Hence, the configuration has two queues. >> >> It's possible that I made some error in the configuration and rsyslogd >> is >> crashing because I'm doing something that I wasn't supposed to do, but >> it >> didn't detect the faulty configuration early on. >> >> The whole thing works fine when I have only one queue (created with >> $Ruleset) and input and omprog modules on it. But I'd really like to >> use >> two thread pools. It should be possible to reproduce this problem with >> cat >> as the omprog binary, although I haven't tried. >> >> One curiosity (probably unrelated to the problem): $GenerateConfigGraph >> at >> the end of the config file creates a picture which has only the main >> queue, but the queues I configured with $Ruleset directives are not on >> the >> picture. >> >> The below is from gdb. The process was started from gdb, so there's no >> call to sigsegvHdlr(), which can be seen in the core file when I start >> rsyslogd on its own. >> >> (gdb) info threads >> * 8 Thread 0xb4debb90 (LWP 11149) ConsumerReg (pThis=0x80b7988, >> pWti=0x80b7cb8) at queue.c:1679 >> 7 Thread 0xb57ecb90 (LWP 11148) 0x00d46402 in __kernel_vsyscall () >> 6 Thread 0xb61edb90 (LWP 11147) msgDestruct (ppThis=0xb61ed1d4) at >> msg.c:790 >> 5 Thread 0xb6beeb90 (LWP 11146) 0x00d46402 in __kernel_vsyscall () >> 4 Thread 0xb75efb90 (LWP 11145) 0x00d46402 in __kernel_vsyscall () >> 3 Thread 0xb7ff0b90 (LWP 11144) 0x00d46402 in __kernel_vsyscall () >> 2 Thread 0xb7ff1ac0 (LWP 11111) 0x00d46402 in __kernel_vsyscall () >> (gdb) bt >> #0 0x00d46402 in __kernel_vsyscall () >> #1 0x00b2f040 in raise () from /lib/i686/nosegneg/libc.so.6 >> #2 0x00b30a21 in abort () from /lib/i686/nosegneg/libc.so.6 >> #3 0x00b67e3b in __libc_message () from /lib/i686/nosegneg/libc.so.6 >> #4 0x00b70758 in free () from /lib/i686/nosegneg/libc.so.6 >> #5 0x080612ee in msgDestruct (ppThis=0xb4deb1d4) at msg.c:816 >> #6 0x08079e35 in DeleteProcessedBatch (pThis=0x80b7988, >> pBatch=0x80b7cd0) >> at queue.c:1404 >> #7 0x0807a3b9 in DequeueConsumableElements (pThis=0x80b7988, >> pWti=0x80b7cb8) >> at queue.c:1441 >> #8 DequeueConsumable (pThis=0x80b7988, pWti=0x80b7cb8) at queue.c:1489 >> #9 0x0807a5d7 in DequeueForConsumer (pThis=0x80b7988, pWti=0x80b7cb8) >> at queue.c:1626 >> #10 ConsumerReg (pThis=0x80b7988, pWti=0x80b7cb8) at queue.c:1679 >> #11 0x0807350e in wtiWorker (pThis=0x80b7cb8) at wti.c:315 >> #12 0x08072e1f in wtpWorker (arg=0x80b7cb8) at wtp.c:381 >> #13 0x00c9b869 in start_thread () from >> /lib/i686/nosegneg/libpthread.so.0 >> #14 0x00bd9e9e in clone () from /lib/i686/nosegneg/libc.so.6 >> >> The crash happens in msgDestruct() when it tries to free >> pThis->rcvFrom.pfrominet. Valgrind says it's a double free problem. >> >> The queue mutex used by DequeueForConsumer seems to be properly locked >> my >> thread 8. From stack frame 10: >> >> (gdb) p *pThis->mut >> $261 = {__data = {__lock = 2, __count = 0, __owner = 11149, __kind = 0, >> __nusers = 1, {__spins = 0, __list = {__next = 0x0}}}, >> __size = >> "\002\000\000\000\000\000\000\000\215+\000\000\000\000\000\000\001\000\ >> 000\000\000\000\000", __align = 2} >> >> The value for __lock is curious. It's usually 1 for locked or 0 for >> unlocked, but it might have something to do with gdb. It's 1 in the >> core >> files. pThis->mutThrdMgmt is unlocked. >> >> I've checked omruleset code and it does a proper deep copy, as far as I >> can tell. All the code in msg.c also seems fine. So I don't know what's >> happening. >> >> -- >> .-. .-. Yes, I am an agent of Satan, but my duties are largely >> (_ \ / _) ceremonial. >> | >> | [email protected] > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

