Re: [rsyslog] handling iso8859 characters with json property replacer

Rainer Gerhards Tue, 15 Apr 2014 05:38:32 -0700

Thanks, I have merged it now into v7-stable, v8-stable and v8-devel
(master). Will be released with the next releases.


Rainer


On Sat, Apr 12, 2014 at 11:32 PM, Axel Rau <[email protected]> wrote:

>
> Am 10.10.2013 um 14:46 schrieb Rainer Gerhards <[email protected]>:
>
> > On Thu, Oct 3, 2013 at 9:09 AM, Risto Vaarandi <[email protected]
> >wrote:
> >
> >> On 09/27/2013 02:33 PM, Risto Vaarandi wrote:
> >>
> >>> On 09/20/2013 06:29 PM, Rainer Gerhards wrote:
> >>>
> >>>> On Fri, Sep 20, 2013 at 12:01 PM, Axel Rau <[email protected]>
> wrote:
> >>>>
> >>>> Obviously, there are many ways to improve that module. But I thought
> >>>>>> I at
> >>>>>> least get it started and gather some feedback. If time permits, I'll
> >>>>>> add
> >>>>>> some more functionality later today. But the basic need should be
> >>>>>> solved
> >>>>>> (if I understood correctly ;)).
> >>>>>>
> >>>>> I think, the "basic need" would be to replace only invalid UTF-8
> >>>>> sequences, not every character with code < 32 or > 126.
> >>>>>
> >>>>>
> >>>> thanks. I have just committed to master branch a version who does this
> >>>> (by
> >>>> default).
> >>>>
> >>>>
> >>>> Does the restriction to IPv4 apply only to expressions like
> >>>>>         if $fromhost-ip == "10.0.0.1" then
> >>>>> or in general?
> >>>>>
> >>>>>
> >>>> That was a leftover from the doc I used as basis, mmutf8fix doesn't
> care
> >>>> about IPvx. It's removed now.
> >>>>
> >>>> @risto: you now need to specify mode="controlcharacters" to replace
> all
> >>>> non-printable US-ASCII characters, as proper UTF-8 checks are default
> >>>> now.
> >>>> I thought that's more appropriate for this module ;)
> >>>>
> >>>> See last sample in doc:
> >>>> http://www.rsyslog.com/doc/**mmutf8fix.html<
> http://www.rsyslog.com/doc/mmutf8fix.html>
> >>>>
> >>>> Feedback on this module is appreciated.
> >>>>
> >>>
> >>> I have had it running for about a week and so far it has been able to
> >>> write all log messages into Elasticsearch without issues (many millions
> >>> of messages per day). Looks like it is working just as expected.
> >>> kind regards,
> >>> risto
> >>>
> >>>
> >> Actually, yesterday there was one write failure into elasticsearch.
> >> I had rsyslogd running with
> >>
> >> action(type="mmutf8fix" replacementChar="_")
> >>
> >> statement which accepts not only us-ascii, but also all utf8 characters.
> >>
> >> The log message was badly malformed and a number of replacements were
> done
> >> (my replacement character is "_" as you can see from above statement).
> >> However, in the very end of the log message the final byte was a
> non-utf8
> >> character, and that was left unreplaced. The last 4 bytes of the message
> >> look like follows:
> >>
> >> <us-ascii char><us-ascii char><replaced char><unreplaced non-utf8 char>
> >>
> >> The error message produced by elasticsearch looks like follows:
> >> Invalid UTF-8 middle byte 0x22
> >>
> >> The issue is not very urgent, because I mostly care about us-ascii
> >> characters and can thus enable mode="controlcharacters" for a
> workaround.
> >>
> >>
> > I finally fixed this border case, patch here:
> >
> >
> http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=97bda43e372a506671cb7007b6041e4160a02b04
> >
> > I would appreciate if you could apply and test the patch, as I had only
> > time to do a very quick test (plus obviously nothing beats high-volume
> real
> > traffic ;)).
> >
> I found another border case with 2-byte-sequence, where 0xc0 and 0xc1 are
> not allowed:
> - - -
> diff --git a/plugins/mmutf8fix/mmutf8fix.c b/plugins/mmutf8fix/mmutf8fix.c
> index 351bb12..65739ed 100644
> --- a/plugins/mmutf8fix/mmutf8fix.c
> +++ b/plugins/mmutf8fix/mmutf8fix.c
> @@ -254,9 +254,14 @@ doUTF8(instanceData *pData, uchar *msg, int lenMsg)
>                                 ; /* nothing to do, all well */
>                         } else if((c & 0xe0) == 0xc0) {
>                                 /* 2-byte sequence */
> -                               strtIdx = i;
> -                               seqLen = bytesLeft = 1;
> -                               codepoint = c & 0x1f;
> +                               /* 0xc0 and 0xc1 are illegal */
> +                               if (c == 0xc0 || c == 0xc1) {
> +                                 msg[i] = pData->replChar;
> +                               } else {
> +                                 strtIdx = i;
> +                                 seqLen = bytesLeft = 1;
> +                                 codepoint = c & 0x1f;
> +                               }
>                         } else if((c & 0xf0) == 0xe0) {
>                                 /* 3-byte sequence */
>                                 strtIdx = i;
> - - -
> Regards, Axel
> ---
> PGP-Key:29E99DD6  ☀ +49 151 2300 9283  ☀ computing @ chaos claudius
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] handling iso8859 characters with json property replacer

Reply via email to