Is it possible that rsyslog is not receiving the 4 ASCII characters <, 8, 0, >; 
but rather a single character (hex 0x80) that the JSON encoder is trying to 
interpret as a start character for a multi-byte character sequence, and 
something else is then displaying as <80> in the logline?

https://en.wikipedia.org/wiki/UTF-8 indicates that 0080 is one of several valid 
start characters for a 2-byte unicode value, and JSON expects all strings to be 
UTF-8.

To potentially resolve, I'd try adding action(type="mmutf8fix") to your rsyslog 
ruleset.

As to how the 'invalid' character got into the log stream in the first place: 
I've seen similar situations where Windows hosts were sending CP-1252 character 
set, which Wikipedia says:

>    ... is a superset of ISO 8859-1, but differs from the IANA's ISO-8859-1 by 
> using
>    displayable characters rather than control characters in the 80 to 9F 
> (hex) range.

and it frequently occurs in windows event descriptions that contain apostrophes 
that in ASCII would be the ['] character (decimal 39), but in CP-1252 are [’] 
(decimal 146) instead.

- Dave

> On Feb 23, 2016, at 9:09 AM, Joe Blow <[email protected]> wrote:
> 
> Correct.  I get things like this in my omelasticsearch error log:
> 
> "error":        "MapperParsingException[failed to parse [csuriquery]];
> nested: JsonParseException[Invalid UTF-8 start byte 0x80\n at [Source:
> [B@2210517d; line: 1, column: 450]]
> 
> Then within the normalized JSON i see my <80> tags at that line.
> 
> Any ideas?
> 
> Cheers,
> 
> JB
> 
> On Tue, Feb 23, 2016 at 9:33 AM, Rainer Gerhards <[email protected]>
> wrote:
> 
>> 2016-02-23 15:29 GMT+01:00 Joe Blow <[email protected]>:
>>> Hey all,
>>> 
>>> I've got some logs which might have different languages in them, and it
>>> appears that things like this are tripping up when i try and send them to
>>> elasticsearch:
>>> 
>>> KEDANOVA%20FA<80>ANES&sec=08&
>>> 
>>> Specifically the <80>.  What is the best way to escape both the < and
>> the >
>>> in the normalized field?  I'm already specifying the format as JSON, so
>>> backslashes are being escaped properly.  Any ideas?
>> 
>> I am not aware that <> need to be escaped. Maybe another ES JSON
>> incompatibility?
>> 
>> Rainer
>>> 
>>> Thanks in advance.
>>> 
>>> Cheers,
>>> 
>>> JB
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>> 
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
> LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to