Sorry, I missed this mail initially. To cut a long story short, the sanitizer
works much like you describe ;)

Perquisite: the message buffer is exactly as long as needed (no extra space
available).

The sanitizer works in two phases: 

1. check if something needs to be done

If not, terminate. Otherwise:

2. alloc new buffer, sanitize, replace old buffer

I think the Perquisite is not given if messages are so small that they fit
into the msg-object provided buffer. In that case, the sanitzer could
probably be optimized :)

Rainer

> -----Original Message-----
> From: [email protected] [mailto:rsyslog-
> [email protected]] On Behalf Of [email protected]
> Sent: Friday, February 11, 2011 8:42 AM
> To: rsyslog-users
> Subject: Re: [rsyslog] how can a parser insert data into a message
> 
> On Fri, 11 Feb 2011, Rainer Gerhards wrote:
> 
> >> -----Original Message-----
> >> From: [email protected] [mailto:rsyslog-
> >> [email protected]] On Behalf Of [email protected]
> >>
> >> On Fri, 11 Feb 2011, Rainer Gerhards wrote:
> >>
> >>> Have a look at ./runtime/parser.c, function SanitizeMsg. It builds
> a
> >>> new buffer and uses MsgSetRawMsg to set the new buffer.
> MsgSetRawMsg
> >>> handles the "dirty" internals of message object buffer
> manipulation.
> >>>
> >>> Note that it may be quicker to manipulate the buffer pointers
> >>> yourself. But then you must be very careful. MsgSetRawMsg should
> >>> provide the necessary hints. The thing to keep on your mind is that
> up
> >>> to a certain message length, a buffer is used from the msg object
> >>> itself (thus saving one malloc/free call) whereas for larger size
> >>> messages, memory is allocated. You need to keep that straight
> during
> >>> manipulation.
> >>
> >> I'll look at it and see how hard it is to separate these two cases.
> >> thanks
> >> for the pointer here.
> >
> > Just let me add that I did find it of questionable value to try avoid
> the
> > malloc here. At least in the sanitization problem, this would have
> resulted
> > in very complex code. And while saving memory writes and calls to the
> malloc
> > subsystem is useful, I thought that it would not have brought much
> benefit in
> > that case. Depending on what you intend to do (well-defined insert at
> late
> > point) things may be different, though.
> 
> My initial thought is something along the following
> 
> 1. find out how much space is available in whatever buffer the message
> is
> in (potentially 0 if the buffer is exactly the right size)
> 
> document what needs to happen to adjust how much of the buffer is used
> (I've already figured out some of this with the existing parser
> modules)
> 
> 2. if there is not enough space, document what the process is to
> allocate
> a new buffer and make the system use it.
> 
> at this point it should be fairly straightforward to write a routine to
> do
> something along the lines of 'make sure I have enough space in the
> buffer
> to add X characters' and have it either return immediatly if there's
> enough space or allocate the larger buffer if needed and return after
> doing that.
> 
> there will be some things that will need to be documented as side
> effects
> (pointers into the existing message may be invalid at that point,
> including values in the msg structure)
> 
> this could be mis-used (running this routine for every control
> character
> found could result in many malloc/free pairs for example), and so
> examples
> will need to be given of doing a 2-pass routine, pass 1 to figure out
> what
> you want to do, and then make sure there's enough space and do pass 2
> to
> modify the buffer as needed.
> 
> Using this for sanitizing would still be slightly less efficient than
> the
> approach you probably use now (allocate a new buffer, copy things into
> it
> as you go to construct a new message, then set the message into the
> structure), but probably not by more than two copies of the text. As a
> result, it may be that the result will be enough cleaner to be worth
> the
> cost. I'm thinking that the new routine would be to copy the text from
> the
> old buffer to the new one, then copy everything after your first insert
> to
> the end of the buffer. after that you are copying data from late in the
> buffer to earlier in the buffer, which may even be faster than copying
> small amounts of data from one buffer to another as it may result in
> better cache behavior.
> 
> in fact, this pattern is probably common enough to make it a routine
> itself
> 
> something like
> 
> int InsertIntoRawMsg(int offset, int count)
> 
> inserts at least count spaces into the message at position offset from
> the beginning of the message, returns the number of spaces actually
> inserted (may be more than the number requested)
> 
> or would it be better to return the number of extra characters
> available
> in the buffer after the end of the string?
> 
> I figure error checking on the return is not needed because if it can't
> allocate the space we need to bail out (with whatever rsyslog does when
> it
> runs out of memory, probably aborting the message entirely)
> 
> David Lang
> 
> > Rainer
> >
> >>
> >>> As a side-note, it would probably be useful if you could take some
> >>> bullet points on how to modify things, so that others can find that
> >>> information in the case they want to do that themselves. Could go
> to
> >>> the wiki or I could include it in the doc set. Just a suggestion,
> >>> though...
> >>
> >> I'll see what I can do.
> >>
> >> David Lang
> >>
> >>> Rainer
> >>>
> >>>> -----Original Message-----
> >>>> From: [email protected] [mailto:rsyslog-
> >>>> [email protected]] On Behalf Of [email protected]
> >>>> Sent: Friday, February 11, 2011 5:38 AM
> >>>> To: rsyslog-users
> >>>> Subject: [rsyslog] how can a parser insert data into a message
> >>>>
> >>>> the various parser modules that I've submitted are all removing
> data
> >>>> from
> >>>> the log message or overwriting the data in place.
> >>>>
> >>>> But I've now run across a situation where I need to insert
> >> information
> >>>> into the message. I know that this can be done because the
> >> sanitizing
> >>>> call
> >>>> does exactly this. I am assuming that this is doing something like
> >>>> allocating a new string and copying the data into the new string.
> >>>>
> >>>> the concern is how to do this in a way that will survive the exit
> >> from
> >>>> the
> >>>> module, not confuse any of the many pointers or sizes that are
> >>>> involved,
> >>>> and make sure everything is properly freed afterwords.
> >>>>
> >>>> should I just search for the sanitizing routine and copy what it
> >> does
> >>>> (and
> >>>> can you point me at it?), or do you want me to wait until you have
> >> time
> >>>> to
> >>>> write something up on this?
> >>>>
> >>>> David Lang
> >>>> _______________________________________________
> >>>> rsyslog mailing list
> >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>>> http://www.rsyslog.com
> >>> _______________________________________________
> >>> rsyslog mailing list
> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>> http://www.rsyslog.com
> >>>
> >> _______________________________________________
> >> rsyslog mailing list
> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> http://www.rsyslog.com
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to