On Fri, 28 Aug 2009, Rainer Gerhards wrote: >> -----Original Message----- >> From: [email protected] [mailto:rsyslog- >> [email protected]] On Behalf Of Rainer Gerhards >> >>> >>> that would be hard to so for a couple reasons >>> >>> at 5-10 times slower the system may not be able to keep up (even with >>> the >>> 'slower' afternoon traffic) >>> >>> this is running on a very hardened production server, getting >> valgrind >>> installed there would require permission from the SVP level. >>> >> >> understood. So let me see what else I can come up with :) > > I tried a lab yesterday where I sent roughly 1.5 billion messages (based on > what I saw in the debug logs). Unfortunately, no abort happened. However, my > traffic patterns was continous traffic of the same message. > > So I am now going to create some new tooling that permits me to mimic your > traffic pattern much better. That will probably require until early next > week. To make this really work, it would be really useful if you could send > me some complete messages from your environment. I suggest to forward them > via private mail. I hope this is possible. > > Also, it would be good if you could --enable-rtinst --enable-debug and try > out that version on your machine. I am a bit concerned about the speed of the > resulting executable, it may be too slow. You do not need to run it in debug > mode itself. These option (especially--enable-debug) will activate in-depth > runtime checks (assert, will abort when something wrong happens) and my hope > is that they will catch the bug closer to the root cause. If so, I would need > the gdb abort info (actually enabling debug output would be an option some > time later). > > Please let me know what would be OK with you.
I will give this a try. I was going to suggest that since we have the message getting corrupted it may make sense to make a temporary branch that has multiple message buffers and at various times through the message processing it makes a copy of the emssage to the buffer. when the system crashes I will be able to look at the core and see where the message is getting corrupted. I will see about doing a tcpdump at the time that I do this and send it to you (I'll need to check with management, but since we have a contract in place for other reasons I think we can do this) I can't do this late on a friday, but I should be able to do this monday afternoon. David Lang _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

