As far as I know, this is not available in rsyslog. It's something that comes up once in a while as a need, but usually it's been fairly easy to work-around by interposing an external program between rsyslog and the output.

Rsyslog doesn't know or care about character sets internally, it deals with straight C strings that can contain arbitrary non-null bytes.

Currently rsyslog has the control character escaping code because most of what it has traditionally dealt with has been ascii text data

What I think you are needing is the ability to define a sed like filter to an output so that you can define a mapping of iso8859 characters to UTF8 characters.

Another option would be to create a message modification module to make the changes to the message that rsyslog is processing instead of to the output.

a third option would be to create a new function that would take the value of a variable/property, transform it (per additional parameters), and assign the result to another variable

Since you are dealing with elasticsearch output, a fourth option (for your use case) would be to create a string generator module that created the string format that you need for elasticsearch, and cleaned up character sets in the meantime


I think the fourth would be the quickest to get implemented, but the most limited

the third would be the most flexible, but the most complicated to use

thinking about the second, I wonder how hard it would be to make a mm module that was little more than a wrapper around an external program to be able to offload the work to something already optimized for the work.

Rainer, any thoughts as to which option would be easiest to implement (both for this specific character set conversion and the more general conversion problem)?


I have not had to deal with character set conversion very much, so I'm not familiar with what tools are available to do this sort of conversion already.

David Lang




On Wed, 18 Sep 2013, Risto Vaarandi wrote:

hi folks,

I've been using the omelasticsearch output module for quite some time, and I am happy with it. However, there is one issue I haven't been able to tackle. Since I am writing data to Elasticsearch from wide variety of sources, I am accidentally running into syslog messages which contain some iso8859 characters. Unfortunately, when trying to write them into Elasticsearch as-is, you would get back the following error:

org.elasticsearch.index.mapper.MapperParsingException: failed to parse [@message]
...
...
...
Caused by: org.elasticsearch.common.jackson.core.JsonParseException: Invalid UTF-8 start byte 0x99

Apparently, the 'json' property replacer is not able to detect and remove (or replace) such characters.

As a solution, I have tried to add space-cc or drop-cc property replacer to json, for example:

\"@message\":\"%rawmsg:::space-cc,json%\"

but they have no effect (in addition, I have specified
$EscapeControlCharactersOnReceive off
as recommended by rsyslog documentation).

Is there any way to handle this problem? So far, I've been happy with rsyslog+Elasticsearch setup, and I wouldn't like to add any Java based tool into the processing pipeline.

kind regards,
risto
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to