On 09/18/2013 03:09 PM, Rainer Gerhards wrote:
On Wed, Sep 18, 2013 at 2:02 PM, David Lang <[email protected]> wrote:

As far as I know, this is not available in rsyslog. It's something that
comes up once in a while as a need, but usually it's been fairly easy to
work-around by interposing an external program between rsyslog and the
output.

Rsyslog doesn't know or care about character sets internally, it deals
with straight C strings that can contain arbitrary non-null bytes.

Currently rsyslog has the control character escaping code because most of
what it has traditionally dealt with has been ascii text data

What I think you are needing is the ability to define a sed like filter to
an output so that you can define a mapping of iso8859 characters to UTF8
characters.

Another option would be to create a message modification module to make
the changes to the message that rsyslog is processing instead of to the
output.

a third option would be to create a new function that would take the value
of a variable/property, transform it (per additional parameters), and
assign the result to another variable

Since you are dealing with elasticsearch output, a fourth option (for your
use case) would be to create a string generator module that created the
string format that you need for elasticsearch, and cleaned up character
sets in the meantime


I think the fourth would be the quickest to get implemented, but the most
limited

the third would be the most flexible, but the most complicated to use

thinking about the second, I wonder how hard it would be to make a mm
module that was little more than a wrapper around an external program to be
able to offload the work to something already optimized for the work.

Rainer, any thoughts as to which option would be easiest to implement
(both for this specific character set conversion and the more general
conversion problem)?


I need to think a bit more about this problem, especially as it boils up
every now and then. I remember that one proposed solution was the ability
to assign a character set to inputs and make the database outputs aware of
the charset of the message in question (which could change very
frequently). I haven't explored this in enough depth right now, but if it
works, it sounds like a good solution (with medium time-to-implement
footprint). So let's add this as #5 ;)

From what you gave, I think #1 or 2 is probably the quickest to implement.

@Risto: what would be the minimal solution to solve your problem? What
would be the best one?

That is somewhat hard question, since the answer has to consider what is realistically possible without breaking any other functionality. Initially I was thinking of converting any symbol outside US ASCII 32-127 to #<code> representation (like it can be done with messages written to flat files), omitting them altogether, or replacing with space. To achieve this, I was thinking about allowing the use of 'escape', 'drop' or 'space' property replacers together with 'json'. This option would only make sense for plain US-ASCII json fields, since for utf8 it is likely to break something. This is just a quick thought and since I don't know the internals of rsyslog, it could contradict with existing algorithms and data structures. If it isn't something which makes sense and does not harm performance, it is better to forget about it :)
kind regards,
risto



Rainer



_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to