> -----Original Message-----
> From: [email protected] [mailto:rsyslog-
> [email protected]] On Behalf Of Dražen Kacar
> Sent: Wednesday, February 16, 2011 1:17 PM
> To: [email protected]
> Subject: [rsyslog] Race conditions and crashes
> 
> Hello.
> 
> I have rsyslog 5.6.2 (+ patches for blocking FIFO write and setting
> thread
> scheduling class) on CentOS 5.5 (64-bit) and I have a number of
> crashes.
> SInce 2011-02-02 there were 27 SIGSEGVs and 35 SIGABRTs on one of the
> mavhines in the cluster.
> 
> SIGABRTs are generated by glibc:
> 
> *** glibc detected *** /opt/bulb/sbin/rsyslogd: double free or
> corruption
> (fasttop): 0x00002aaab02bc4c0 ***
> 
> SIGSEGVs are the usual NULL pointer accesses. I didn't check all core
> files, but the ones I checked had that condition.
> 
> I decided to run rsyslog through Sun's Data Race analyzer[1] and it
> found
> a few problems. The tool is free and it runs under Linux as well, but
> it
> brings Sun's compiler which doesn't handle all of gcc extensions, so I
> had
> to change the code to make it compile. The patch is attached. It adds
> members to empty structs in a few places.

will see that I add that :) Since I use gcc on Solaris, this seems to have
slipped my attention ;)

> 
> Since that compiler doesn't have gcc atomic access builtins, config.h
> contains this:
> 
> /* Define if compiler provides atomic builtins */
> /* #undef HAVE_ATOMIC_BUILTINS */
> 
> /* Define if compiler provides 64 bit atomic builtins */
> /* #undef HAVE_ATOMIC_BUILTINS_64BIT */
> 
> My test was receiving 4 lines via UDP and writing them to a file and a
> FIFO.
> It was as simple as I could make it. Thread scheduling class was not
> set.

did you experience any problems without the analyzer in this setting? As I
said, I am searching for this bug but so far we are unable to reproduce (I
even got some help from Florian, but so far to no avail...).

> 
> The tool found the following problems:
> 
> Total Races:  4 Experiment:  exp1.er
> 
> Race #1, Vaddr: 0x13909168
>       Access 1: Read,  GetNxt + 0x0000008A,
>                        line 346 in "modules.c"
>       Access 2: Write, addModToList + 0x00000131,
>                        line 326 in "modules.c"
>   Total Callstack Traces: 1
> 
> Race #2, Vaddr: (Multiple Addresses)
>       Access 1: Read,  wtpShutdownAll + 0x00000371,
>                        line 247 in "wtp.c"
>       Access 2: Write, wtpWrkrExecCleanup + 0x000000F2,
>                        line 310 in "wtp.c"
>   Total Callstack Traces: 2
> 
> Race #3, Vaddr: (Multiple Addresses)
>       Access 1: Read,  thrdDestruct + 0x00000058,
>                        line 76 in "threads.c"
>       Access 2: Write, thrdStarter + 0x000001A2,
>                        line 197 in "threads.c"
>   Total Callstack Traces: 1
> 
> Race #4, Vaddr: 0x1394764c
>       Access 1: Read,  processSocket + 0x000000FE,
>                        line 314 in "imudp.c"
>       Access 2: Write, thrdTerminateNonCancel + 0x000000CC,
>                        line 100 in "threads.c"
>   Total Callstack Traces: 1
> 
> 
> What it found really are unprotected memory accesses (ie. bugs), but
> all
> of them are in insignificant places:
> 
> race #1 - module loading
this is known and really no issue

> race #2 - shutdown all workers
> race #3 - thread destructor (this one might be responsible for
> something)
I think they are OK as well, but I will check. Maybe just atomic emulation is
missing. May also be that this is a case where it really doesn't matter if
dual reads are necessary.

> race #4 - thread termination on SIGTTIN
sounds interesting, will check.

And I think my initial answer was only partly correct. I assumed the tool was
something like clang static analyzer. I use valgrind tools very frequently,
and there are two thread error detectors, drd and helgrind. Both have pros
and cons, and I regularly use both. Unfortunately, some kinds of races do not
manifest in valgrind. In any case, I'd suggest you also give it a try if you
don't know the tool. It is excellent and has given rsyslog's code quality a
real boost when I found it (Thanks to Peter and others for making me aware of
it!).

Rainer
> 
> 
> My production system is a bit more complicated than that. It has UDP
> and
> TCP receivers and a few more threads created than the test system.
> I suppose I could test some more and try to find errors in other
> places,
> but before I do I'd like to know if anyone else used tools of this kind
> on
> rsyslog. And if so, what the results were.
> 
> [1] http://download.oracle.com/docs/cd/E19205-01/821-2124/index.html
> 
> --
>  .-.   .-.    Yes, I am an agent of Satan, but my duties are largely
> (_  \ /  _)   ceremonial.
>      |
>      |        [email protected]
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to