Hello.

I have rsyslog 5.6.2 (+ patches for blocking FIFO write and setting thread
scheduling class) on CentOS 5.5 (64-bit) and I have a number of crashes.
SInce 2011-02-02 there were 27 SIGSEGVs and 35 SIGABRTs on one of the
mavhines in the cluster.

SIGABRTs are generated by glibc:

*** glibc detected *** /opt/bulb/sbin/rsyslogd: double free or corruption
(fasttop): 0x00002aaab02bc4c0 ***

SIGSEGVs are the usual NULL pointer accesses. I didn't check all core
files, but the ones I checked had that condition.

I decided to run rsyslog through Sun's Data Race analyzer[1] and it found
a few problems. The tool is free and it runs under Linux as well, but it
brings Sun's compiler which doesn't handle all of gcc extensions, so I had
to change the code to make it compile. The patch is attached. It adds
members to empty structs in a few places.

Since that compiler doesn't have gcc atomic access builtins, config.h
contains this:

/* Define if compiler provides atomic builtins */
/* #undef HAVE_ATOMIC_BUILTINS */

/* Define if compiler provides 64 bit atomic builtins */
/* #undef HAVE_ATOMIC_BUILTINS_64BIT */

My test was receiving 4 lines via UDP and writing them to a file and a FIFO.
It was as simple as I could make it. Thread scheduling class was not set.

The tool found the following problems:

Total Races:  4 Experiment:  exp1.er

Race #1, Vaddr: 0x13909168
      Access 1: Read,  GetNxt + 0x0000008A, 
                       line 346 in "modules.c"
      Access 2: Write, addModToList + 0x00000131, 
                       line 326 in "modules.c"
  Total Callstack Traces: 1

Race #2, Vaddr: (Multiple Addresses)
      Access 1: Read,  wtpShutdownAll + 0x00000371, 
                       line 247 in "wtp.c"
      Access 2: Write, wtpWrkrExecCleanup + 0x000000F2, 
                       line 310 in "wtp.c"
  Total Callstack Traces: 2

Race #3, Vaddr: (Multiple Addresses)
      Access 1: Read,  thrdDestruct + 0x00000058, 
                       line 76 in "threads.c"
      Access 2: Write, thrdStarter + 0x000001A2, 
                       line 197 in "threads.c"
  Total Callstack Traces: 1

Race #4, Vaddr: 0x1394764c
      Access 1: Read,  processSocket + 0x000000FE, 
                       line 314 in "imudp.c"
      Access 2: Write, thrdTerminateNonCancel + 0x000000CC, 
                       line 100 in "threads.c"
  Total Callstack Traces: 1


What it found really are unprotected memory accesses (ie. bugs), but all
of them are in insignificant places:

race #1 - module loading
race #2 - shutdown all workers
race #3 - thread destructor (this one might be responsible for something)
race #4 - thread termination on SIGTTIN


My production system is a bit more complicated than that. It has UDP and
TCP receivers and a few more threads created than the test system.
I suppose I could test some more and try to find errors in other places,
but before I do I'd like to know if anyone else used tools of this kind on
rsyslog. And if so, what the results were.

[1] http://download.oracle.com/docs/cd/E19205-01/821-2124/index.html

-- 
 .-.   .-.    Yes, I am an agent of Satan, but my duties are largely
(_  \ /  _)   ceremonial.
     |
     |        [email protected]
--- runtime/datetime.h.orig	2011-02-15 18:42:13.000000000 +0100
+++ runtime/datetime.h	2011-02-15 18:43:13.000000000 +0100
@@ -28,6 +28,7 @@
 
 /* the datetime object */
 typedef struct datetime_s {
+    	char dummy;
 } datetime_t;
 
 
--- runtime/errmsg.h.orig	2011-02-15 18:42:20.000000000 +0100
+++ runtime/errmsg.h	2011-02-15 18:43:26.000000000 +0100
@@ -30,6 +30,7 @@
 
 /* the errmsg object */
 typedef struct errmsg_s {
+    	char dummy;
 } errmsg_t;
 
 
--- runtime/expr.h.orig	2011-02-15 18:42:07.000000000 +0100
+++ runtime/expr.h	2011-02-15 18:43:01.000000000 +0100
@@ -30,6 +30,7 @@
 
 /* a node inside an expression tree */
 typedef struct exprNode_s {
+    	char dummy;
 } exprNode_t;
 
 
--- runtime/modules.h.orig	2011-02-15 18:42:01.000000000 +0100
+++ runtime/modules.h	2011-02-15 18:42:48.000000000 +0100
@@ -119,6 +119,7 @@
 			rsRetVal (*parseSelectorAct)(uchar**, void**,omodStringRequest_t**);
 		} om;
 		struct { /* data for library modules */
+		    	char dummy;
 		} lm;
 		struct { /* data for parser modules */
 			rsRetVal (*parse)(msg_t*);
--- tools/omdiscard.c.orig	2011-02-15 18:46:57.000000000 +0100
+++ tools/omdiscard.c	2011-02-15 18:47:11.000000000 +0100
@@ -44,6 +44,7 @@
 DEF_OMOD_STATIC_DATA
 
 typedef struct _instanceData {
+    	char dummy;
 } instanceData;
 
 /* we do not need a createInstance()!
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to