Thanks, I have merged it now into v7-stable, v8-stable and v8-devel (master). Will be released with the next releases.
Rainer On Sat, Apr 12, 2014 at 11:32 PM, Axel Rau <[email protected]> wrote: > > Am 10.10.2013 um 14:46 schrieb Rainer Gerhards <[email protected]>: > > > On Thu, Oct 3, 2013 at 9:09 AM, Risto Vaarandi <[email protected] > >wrote: > > > >> On 09/27/2013 02:33 PM, Risto Vaarandi wrote: > >> > >>> On 09/20/2013 06:29 PM, Rainer Gerhards wrote: > >>> > >>>> On Fri, Sep 20, 2013 at 12:01 PM, Axel Rau <[email protected]> > wrote: > >>>> > >>>> Obviously, there are many ways to improve that module. But I thought > >>>>>> I at > >>>>>> least get it started and gather some feedback. If time permits, I'll > >>>>>> add > >>>>>> some more functionality later today. But the basic need should be > >>>>>> solved > >>>>>> (if I understood correctly ;)). > >>>>>> > >>>>> I think, the "basic need" would be to replace only invalid UTF-8 > >>>>> sequences, not every character with code < 32 or > 126. > >>>>> > >>>>> > >>>> thanks. I have just committed to master branch a version who does this > >>>> (by > >>>> default). > >>>> > >>>> > >>>> Does the restriction to IPv4 apply only to expressions like > >>>>> if $fromhost-ip == "10.0.0.1" then > >>>>> or in general? > >>>>> > >>>>> > >>>> That was a leftover from the doc I used as basis, mmutf8fix doesn't > care > >>>> about IPvx. It's removed now. > >>>> > >>>> @risto: you now need to specify mode="controlcharacters" to replace > all > >>>> non-printable US-ASCII characters, as proper UTF-8 checks are default > >>>> now. > >>>> I thought that's more appropriate for this module ;) > >>>> > >>>> See last sample in doc: > >>>> http://www.rsyslog.com/doc/**mmutf8fix.html< > http://www.rsyslog.com/doc/mmutf8fix.html> > >>>> > >>>> Feedback on this module is appreciated. > >>>> > >>> > >>> I have had it running for about a week and so far it has been able to > >>> write all log messages into Elasticsearch without issues (many millions > >>> of messages per day). Looks like it is working just as expected. > >>> kind regards, > >>> risto > >>> > >>> > >> Actually, yesterday there was one write failure into elasticsearch. > >> I had rsyslogd running with > >> > >> action(type="mmutf8fix" replacementChar="_") > >> > >> statement which accepts not only us-ascii, but also all utf8 characters. > >> > >> The log message was badly malformed and a number of replacements were > done > >> (my replacement character is "_" as you can see from above statement). > >> However, in the very end of the log message the final byte was a > non-utf8 > >> character, and that was left unreplaced. The last 4 bytes of the message > >> look like follows: > >> > >> <us-ascii char><us-ascii char><replaced char><unreplaced non-utf8 char> > >> > >> The error message produced by elasticsearch looks like follows: > >> Invalid UTF-8 middle byte 0x22 > >> > >> The issue is not very urgent, because I mostly care about us-ascii > >> characters and can thus enable mode="controlcharacters" for a > workaround. > >> > >> > > I finally fixed this border case, patch here: > > > > > http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=97bda43e372a506671cb7007b6041e4160a02b04 > > > > I would appreciate if you could apply and test the patch, as I had only > > time to do a very quick test (plus obviously nothing beats high-volume > real > > traffic ;)). > > > I found another border case with 2-byte-sequence, where 0xc0 and 0xc1 are > not allowed: > - - - > diff --git a/plugins/mmutf8fix/mmutf8fix.c b/plugins/mmutf8fix/mmutf8fix.c > index 351bb12..65739ed 100644 > --- a/plugins/mmutf8fix/mmutf8fix.c > +++ b/plugins/mmutf8fix/mmutf8fix.c > @@ -254,9 +254,14 @@ doUTF8(instanceData *pData, uchar *msg, int lenMsg) > ; /* nothing to do, all well */ > } else if((c & 0xe0) == 0xc0) { > /* 2-byte sequence */ > - strtIdx = i; > - seqLen = bytesLeft = 1; > - codepoint = c & 0x1f; > + /* 0xc0 and 0xc1 are illegal */ > + if (c == 0xc0 || c == 0xc1) { > + msg[i] = pData->replChar; > + } else { > + strtIdx = i; > + seqLen = bytesLeft = 1; > + codepoint = c & 0x1f; > + } > } else if((c & 0xf0) == 0xe0) { > /* 3-byte sequence */ > strtIdx = i; > - - - > Regards, Axel > --- > PGP-Key:29E99DD6 ☀ +49 151 2300 9283 ☀ computing @ chaos claudius > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

