patch is now merged :) On 02/16/2011 01:17 PM, Dražen Kačar wrote:
Hello.I have rsyslog 5.6.2 (+ patches for blocking FIFO write and setting thread scheduling class) on CentOS 5.5 (64-bit) and I have a number of crashes. SInce 2011-02-02 there were 27 SIGSEGVs and 35 SIGABRTs on one of the mavhines in the cluster. SIGABRTs are generated by glibc: *** glibc detected *** /opt/bulb/sbin/rsyslogd: double free or corruption (fasttop): 0x00002aaab02bc4c0 *** SIGSEGVs are the usual NULL pointer accesses. I didn't check all core files, but the ones I checked had that condition. I decided to run rsyslog through Sun's Data Race analyzer[1] and it found a few problems. The tool is free and it runs under Linux as well, but it brings Sun's compiler which doesn't handle all of gcc extensions, so I had to change the code to make it compile. The patch is attached. It adds members to empty structs in a few places. Since that compiler doesn't have gcc atomic access builtins, config.h contains this: /* Define if compiler provides atomic builtins */ /* #undef HAVE_ATOMIC_BUILTINS */ /* Define if compiler provides 64 bit atomic builtins */ /* #undef HAVE_ATOMIC_BUILTINS_64BIT */ My test was receiving 4 lines via UDP and writing them to a file and a FIFO. It was as simple as I could make it. Thread scheduling class was not set. The tool found the following problems: Total Races: 4 Experiment: exp1.er Race #1, Vaddr: 0x13909168 Access 1: Read, GetNxt + 0x0000008A, line 346 in "modules.c" Access 2: Write, addModToList + 0x00000131, line 326 in "modules.c" Total Callstack Traces: 1 Race #2, Vaddr: (Multiple Addresses) Access 1: Read, wtpShutdownAll + 0x00000371, line 247 in "wtp.c" Access 2: Write, wtpWrkrExecCleanup + 0x000000F2, line 310 in "wtp.c" Total Callstack Traces: 2 Race #3, Vaddr: (Multiple Addresses) Access 1: Read, thrdDestruct + 0x00000058, line 76 in "threads.c" Access 2: Write, thrdStarter + 0x000001A2, line 197 in "threads.c" Total Callstack Traces: 1 Race #4, Vaddr: 0x1394764c Access 1: Read, processSocket + 0x000000FE, line 314 in "imudp.c" Access 2: Write, thrdTerminateNonCancel + 0x000000CC, line 100 in "threads.c" Total Callstack Traces: 1 What it found really are unprotected memory accesses (ie. bugs), but all of them are in insignificant places: race #1 - module loading race #2 - shutdown all workers race #3 - thread destructor (this one might be responsible for something) race #4 - thread termination on SIGTTIN My production system is a bit more complicated than that. It has UDP and TCP receivers and a few more threads created than the test system. I suppose I could test some more and try to find errors in other places, but before I do I'd like to know if anyone else used tools of this kind on rsyslog. And if so, what the results were. [1] http://download.oracle.com/docs/cd/E19205-01/821-2124/index.html _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
_______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

