Sorry, I missed this mail initially. To cut a long story short, the sanitizer works much like you describe ;)
Perquisite: the message buffer is exactly as long as needed (no extra space available). The sanitizer works in two phases: 1. check if something needs to be done If not, terminate. Otherwise: 2. alloc new buffer, sanitize, replace old buffer I think the Perquisite is not given if messages are so small that they fit into the msg-object provided buffer. In that case, the sanitzer could probably be optimized :) Rainer > -----Original Message----- > From: [email protected] [mailto:rsyslog- > [email protected]] On Behalf Of [email protected] > Sent: Friday, February 11, 2011 8:42 AM > To: rsyslog-users > Subject: Re: [rsyslog] how can a parser insert data into a message > > On Fri, 11 Feb 2011, Rainer Gerhards wrote: > > >> -----Original Message----- > >> From: [email protected] [mailto:rsyslog- > >> [email protected]] On Behalf Of [email protected] > >> > >> On Fri, 11 Feb 2011, Rainer Gerhards wrote: > >> > >>> Have a look at ./runtime/parser.c, function SanitizeMsg. It builds > a > >>> new buffer and uses MsgSetRawMsg to set the new buffer. > MsgSetRawMsg > >>> handles the "dirty" internals of message object buffer > manipulation. > >>> > >>> Note that it may be quicker to manipulate the buffer pointers > >>> yourself. But then you must be very careful. MsgSetRawMsg should > >>> provide the necessary hints. The thing to keep on your mind is that > up > >>> to a certain message length, a buffer is used from the msg object > >>> itself (thus saving one malloc/free call) whereas for larger size > >>> messages, memory is allocated. You need to keep that straight > during > >>> manipulation. > >> > >> I'll look at it and see how hard it is to separate these two cases. > >> thanks > >> for the pointer here. > > > > Just let me add that I did find it of questionable value to try avoid > the > > malloc here. At least in the sanitization problem, this would have > resulted > > in very complex code. And while saving memory writes and calls to the > malloc > > subsystem is useful, I thought that it would not have brought much > benefit in > > that case. Depending on what you intend to do (well-defined insert at > late > > point) things may be different, though. > > My initial thought is something along the following > > 1. find out how much space is available in whatever buffer the message > is > in (potentially 0 if the buffer is exactly the right size) > > document what needs to happen to adjust how much of the buffer is used > (I've already figured out some of this with the existing parser > modules) > > 2. if there is not enough space, document what the process is to > allocate > a new buffer and make the system use it. > > at this point it should be fairly straightforward to write a routine to > do > something along the lines of 'make sure I have enough space in the > buffer > to add X characters' and have it either return immediatly if there's > enough space or allocate the larger buffer if needed and return after > doing that. > > there will be some things that will need to be documented as side > effects > (pointers into the existing message may be invalid at that point, > including values in the msg structure) > > this could be mis-used (running this routine for every control > character > found could result in many malloc/free pairs for example), and so > examples > will need to be given of doing a 2-pass routine, pass 1 to figure out > what > you want to do, and then make sure there's enough space and do pass 2 > to > modify the buffer as needed. > > Using this for sanitizing would still be slightly less efficient than > the > approach you probably use now (allocate a new buffer, copy things into > it > as you go to construct a new message, then set the message into the > structure), but probably not by more than two copies of the text. As a > result, it may be that the result will be enough cleaner to be worth > the > cost. I'm thinking that the new routine would be to copy the text from > the > old buffer to the new one, then copy everything after your first insert > to > the end of the buffer. after that you are copying data from late in the > buffer to earlier in the buffer, which may even be faster than copying > small amounts of data from one buffer to another as it may result in > better cache behavior. > > in fact, this pattern is probably common enough to make it a routine > itself > > something like > > int InsertIntoRawMsg(int offset, int count) > > inserts at least count spaces into the message at position offset from > the beginning of the message, returns the number of spaces actually > inserted (may be more than the number requested) > > or would it be better to return the number of extra characters > available > in the buffer after the end of the string? > > I figure error checking on the return is not needed because if it can't > allocate the space we need to bail out (with whatever rsyslog does when > it > runs out of memory, probably aborting the message entirely) > > David Lang > > > Rainer > > > >> > >>> As a side-note, it would probably be useful if you could take some > >>> bullet points on how to modify things, so that others can find that > >>> information in the case they want to do that themselves. Could go > to > >>> the wiki or I could include it in the doc set. Just a suggestion, > >>> though... > >> > >> I'll see what I can do. > >> > >> David Lang > >> > >>> Rainer > >>> > >>>> -----Original Message----- > >>>> From: [email protected] [mailto:rsyslog- > >>>> [email protected]] On Behalf Of [email protected] > >>>> Sent: Friday, February 11, 2011 5:38 AM > >>>> To: rsyslog-users > >>>> Subject: [rsyslog] how can a parser insert data into a message > >>>> > >>>> the various parser modules that I've submitted are all removing > data > >>>> from > >>>> the log message or overwriting the data in place. > >>>> > >>>> But I've now run across a situation where I need to insert > >> information > >>>> into the message. I know that this can be done because the > >> sanitizing > >>>> call > >>>> does exactly this. I am assuming that this is doing something like > >>>> allocating a new string and copying the data into the new string. > >>>> > >>>> the concern is how to do this in a way that will survive the exit > >> from > >>>> the > >>>> module, not confuse any of the many pointers or sizes that are > >>>> involved, > >>>> and make sure everything is properly freed afterwords. > >>>> > >>>> should I just search for the sanitizing routine and copy what it > >> does > >>>> (and > >>>> can you point me at it?), or do you want me to wait until you have > >> time > >>>> to > >>>> write something up on this? > >>>> > >>>> David Lang > >>>> _______________________________________________ > >>>> rsyslog mailing list > >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>>> http://www.rsyslog.com > >>> _______________________________________________ > >>> rsyslog mailing list > >>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>> http://www.rsyslog.com > >>> > >> _______________________________________________ > >> rsyslog mailing list > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> http://www.rsyslog.com > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

