On Mon, 31 Aug 2009, Rainer Gerhards wrote:

> quick question: do you have name resolution enabled on the system in
> question? I am asking because I just got a valgrind violation my lab (but not
> an abort yet) that points into the name resolution area.

no, I run this with -x

David Lang

> Rainer
>
>> -----Original Message-----
>> From: [email protected] [mailto:rsyslog-
>> [email protected]] On Behalf Of Rainer Gerhards
>> Sent: Monday, August 31, 2009 12:51 PM
>> To: rsyslog-users
>> Subject: Re: [rsyslog] abort in 4.2.1
>>
>> On Fri, 2009-08-28 at 14:55 -0700, [email protected] wrote:
>>> On Fri, 28 Aug 2009, Rainer Gerhards wrote:
>>>> Also, it would be good if you could --enable-rtinst --enable-debug
>> and try
>>>> out that version on your machine. I am a bit concerned about the
>> speed of the
>>>> resulting executable, it may be too slow. You do not need to run it
>> in debug
>>>> mode itself. These option (especially--enable-debug) will activate
>> in-depth
>>>> runtime checks (assert, will abort when something wrong happens)
>> and my hope
>>>> is that they will catch the bug closer to the root cause. If so, I
>> would need
>>>> the gdb abort info (actually enabling debug output would be an
>> option some
>>>> time later).
>>>>
>>>> Please let me know what would be OK with you.
>>>
>>> I will give this a try.
>>>
>>> I was going to suggest that since we have the message getting
>> corrupted it
>>> may make sense to make a temporary branch that has multiple message
>>> buffers and at various times through the message processing it makes
>> a
>>> copy of the emssage to the buffer. when the system crashes I will be
>> able
>>> to look at the core and see where the message is getting corrupted.
>>
>> David, I fear it is even more complicated than that. It looks like not
>> only the message got corrupted but the message object itself. There are
>> already two copies of some of the message elements, and they also look
>> inconsistent - except, if we really had a null message, that is one
>> with
>> no content at all (and generating a message object from a null message,
>> I think, would be a bug in itself - but I am sure there are no such
>> messages in your actual traffic). If you think there could be a real
>> null message, I'd follow that path (will probably do so in any
>> case...).
>>
>> I think that what really happens is that some part of the code runs
>> wild, thus invalidating some random part of the main memory. At some
>> times, it hits queue structures (or the message object that is held by
>> them) and if so, we will see the abort you experience. With that
>> scenario, duplicating the message buffer does not really help, because
>> looking at the corrupted message object would not provide any
>> additional
>> information.
>>
>> However, if that's easy enough to reproduce, it would probably be good
>> if you could send me the core analysis (the backtrace and the print
>> statements) from a few (five maybe?) independent aborts. Maybe they
>> show
>> a pattern. It would probably best to send them via private mail, as I
>> am
>> not sure if they disclose more than they should.
>>
>>>
>>> I will see about doing a tcpdump at the time that I do this and send
>> it to
>>> you (I'll need to check with management, but since we have a contract
>> in
>>> place for other reasons I think we can do this)
>>>
>>
>> That would probably be a good thing. I've made some progress with my
>> testing tool, and I have created a basic version right now. Probably
>> not
>> good enough to mimic your traffic pattern, but closer. I am doing a
>> test
>> run for quite some time now, unfortunately so far without abort.
>>
>> Note that I run into the trouble with UDP - even though I've put some
>> one-ms sleeps into the code, I lose a lot of messages, as it looks even
>> before they hit the wire. It's always real trobulesome to test with
>> UDP...
>>
>> Rainer
>>> I can't do this late on a friday, but I should be able to do this
>> monday
>>> afternoon.
>>>
>>> David Lang
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com
>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to