Am 18.09.2013 um 12:32 schrieb Risto Vaarandi <[email protected]>:

> hi folks,
> 
> I've been using the omelasticsearch output module for quite some time, and I 
> am happy with it. However, there is one issue I haven't been able to tackle. 
> Since I am writing data to Elasticsearch from wide variety of sources, I am 
> accidentally running into syslog messages which contain some iso8859 
> characters. Unfortunately, when trying to write them into Elasticsearch 
> as-is, you would get back the following error:
> 
> org.elasticsearch.index.mapper.MapperParsingException: failed to parse 
> [@message]
> ...
> ...
> ...
> Caused by: org.elasticsearch.common.jackson.core.JsonParseException: Invalid 
> UTF-8 start byte 0x99
I have that same problem while writing UTF-8 encoded message text to 
PostgreSQL, which refuses invalid UTF-8 sequences.
I have several event sources, producing UTF-8 text. Occasionally things like 
encoding errors in e-mail-headers produce syslog events with wrong UTF-8 
sequences, leading to transactions being rolled back (which is annoying, 
especially with a reliable queuing setup).
Instead of fixing various programs, input- or output-modules of rsyslog, we 
should have one central place where to (optionally) filter/correct illegal 
UTF-8 sequences.

Axel
PS: I have some experimental code handy, which should do the job.
---
PGP-Key:29E99DD6  ☀ +49 151 2300 9283  ☀ computing @ chaos claudius

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to