> -----Original Message-----
> From: [email protected] [mailto:rsyslog-
> [email protected]] On Behalf Of Lorenzo M. Catucci
> Sent: Friday, January 16, 2009 12:29 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] rsyslog still crashes
> 
> On Thu, 15 Jan 2009, Rainer Gerhards wrote:
> 
> RG> On Thu, 2009-01-15 at 18:58 +0100, Lorenzo M. Catucci wrote:
> RG> > I've just tried again rsyslog on my 8 core mail server, and got
> the very
> RG> > same crash from september/october.
> RG>
> RG> So, without valgrind, can you reproduce the issue each time you
> start
> RG> it? That would be very useful.
> RG>
> 
> Yes: any time I start a free-running instance, I get the very same
> segmentation fault and core-file to backtrace.
> 
> RG>
> RG> > I've restarted the server under
> RG> > valgrind control, and all seems to be running well...
> RG>
> RG> I guess the issue here is that valgrind slows down things and also
> RG> simulates (I think) 2 CPUs only.
> RG>
> 
> Right, I didn't know valgrind both limited the CPU bandwidth and the
> (v)CPU number, but any of them would hide the existing race condition

Actually, valgrind executes the app in a virtual CPU/Memory environment.
So this is *quite different* from the real machine, but nevertheless
extremely useful in most cases. While in theory so the actual hardware
should not affect the valgrind outcome, my former debugging has shown it
does. Thus my first try is always valgrind. But it seems not to help
here as we have seen...

> RG>
> RG> From what I have learned so far we seem to have a race condition
> that
> RG> causes memory corrupt. The backtrace you include also points into
> that
> RG> direction. Those few cases where I got a usable backtrace all
point
> to
> RG> the very same location. However, that does not mean this location
> has
> RG> the bug. It seems to occur some time earlier, and manifests when
> the
> RG> message is destructed. It could be a double-free or even some wild
> RG> memory access that accidently overwrites some structures.
> RG>
> RG> If we are able to get a stable repro, and we are able to run with
> at
> RG> least some minimal diagnostics, we may be much better of tackeling
> that
> RG> beast.
> RG>
> RG> First step is to see that we get a stable repro. If we do, I need
> to
> RG> think about minimal debug. The full debugging system makes the bug
> RG> disappear, I think because it changes the timing.
> RG>
> 
> I don't think we could hope for a stable reproducer for an heisen-
> bug...

Of course not 100%. But what you have sounds good enough. I must now see
that/how I can change the system so that we have some additional
instrumentation while the bug is still there. I'll first look at some
compile options. Is it OK for you if I just send some messages to
stdout?

> all I can provide is a very high throughput system generating a very
> high
> local message rate. As a matter of facts, this rsyslog instance is
> acting as a forwader to a remote instance that didn't suffer any
crash.
> 
> The only differences between the engines' configurations are:
>   1. the remote logs to  a postgres instance instead of spool files,
>   2. the remote does just run the postgresql instance and the logger
> 
> My gut feeling is that the different behaviour doesn't come from any
of
> these differences, but from the different memory-path taken from the
> messages, which in the remote case are serialised from the underlying
> network transport.

This may be...

Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to