On Thu, 14 Mar 2013, Rainer Gerhards wrote:

On Thu, 2013-03-14 at 11:26 +0100, Nicolas HAHN wrote:
Hello David,

Well, my opinion is that we should avoid redo the wheel when possible.

What I mean is that encoding translations is supported by the PostgreSQL engine on the 
fly in a transparent manner between tens of encoding schemes. So the simplest and fastest 
thing would be to simply have OMPGSQL module accept an additional parameter 
(client_encoding), and each time OMPGSQL open a connection to the database, the first SQL 
command it should send to the SQL backend should be "SET 
client_encoding='value'".


I think this is probably a good short-term solution. Would a parameter for ompgsql sufficient (for your cases)? If I need to add it to the input, that's a much larger change, as I would need to carry that encoding all through the engine.

While I can see it being considered 'proper' to add encoding type to every string in rsyslog, I think it's overkill and not worth the effort.

For the vast majority of people, the logs are plain ASCII (and for most of the remainder they are UTF8), and the encoding is consistant throughout the log processing. Adding any overhead to figure out or even track the encoding type for this common case is a waste of time.

I think that all you need is a input-time parser module that can 'fix' invalid encodings. Let the admin enable the module if, and only if, they need it in their environment.

This also allows the module to be much simpler. In this case Nicolas just needs a module that says "if this is not valid UTF8 but is valid LATIN1, convert it to UTF8 and pass it on down the chain for the rest of the normal parsers"

Someone else may need a module for some other combination of encodings.

But almost nobody needs the general case of 'handle strings in input, decisioning, and output as any possible encoding'


We had a similar discussion when initially talking about UTF support and at that time we realized that just about everything in rsyslog can just treat the strings as a sequence of bytes without worrying about what the encoding is.

Remember that the system generating the log doesn't tag the log as to what encoding it is, and trying to guess is guaranteed to get it wrong sometime (especially since sometimes it's really a malformed log with binary data, not an encoding issue)


I would suggest creating a simple parser module (the type of thing you do for 500 euro or so as sponsored development) that has the simple logic I outline above. Other people can then expand the options, or sponser additional conversion options. These could be either separate modules, or config options within a module. I would tend to do them as separate modules because some installation may need to use multiple conversions and I think that's going to be easier to do than to have multiple instances of a single module with different config parameters.

personally, I expect that everything will end up being 'convert X to UTF8'.


Why re-implementing a translation module doing the same directly in Rsyslog, as 
it is handled by PostgreSQL?
Well, obviously this is just a pg solution, so in long term, doing it
generally would make sense IMHO.
I don't see the interest except...

...if the client sending logs to the rsyslog server is so crappy that it does 
mix several encoding types

well... I would consider this to be the regular use case in relay chains (for the upper-level relays). To prevent it, the leaf-level released would need to do the code translation.

Exactly, once the logs from multiple boxes get combined into one logstream, it becomes very hard to know what the 'right' thing to do is. You may have this with a single system if different applications on the system have different encodings.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to