On Thu, 14 Mar 2013, Rainer Gerhards wrote:
On Thu, 2013-03-14 at 11:26 +0100, Nicolas HAHN wrote:
Hello David,
Well, my opinion is that we should avoid redo the wheel when possible.
What I mean is that encoding translations is supported by the PostgreSQL engine on the
fly in a transparent manner between tens of encoding schemes. So the simplest and fastest
thing would be to simply have OMPGSQL module accept an additional parameter
(client_encoding), and each time OMPGSQL open a connection to the database, the first SQL
command it should send to the SQL backend should be "SET
client_encoding='value'".
I think this is probably a good short-term solution. Would a parameter for
ompgsql sufficient (for your cases)? If I need to add it to the input, that's
a much larger change, as I would need to carry that encoding all through the
engine.
While I can see it being considered 'proper' to add encoding type to every
string in rsyslog, I think it's overkill and not worth the effort.
For the vast majority of people, the logs are plain ASCII (and for most of the
remainder they are UTF8), and the encoding is consistant throughout the log
processing. Adding any overhead to figure out or even track the encoding type
for this common case is a waste of time.
I think that all you need is a input-time parser module that can 'fix' invalid
encodings. Let the admin enable the module if, and only if, they need it in
their environment.
This also allows the module to be much simpler. In this case Nicolas just needs
a module that says "if this is not valid UTF8 but is valid LATIN1, convert it to
UTF8 and pass it on down the chain for the rest of the normal parsers"
Someone else may need a module for some other combination of encodings.
But almost nobody needs the general case of 'handle strings in input,
decisioning, and output as any possible encoding'
We had a similar discussion when initially talking about UTF support and at that
time we realized that just about everything in rsyslog can just treat the
strings as a sequence of bytes without worrying about what the encoding is.
Remember that the system generating the log doesn't tag the log as to what
encoding it is, and trying to guess is guaranteed to get it wrong sometime
(especially since sometimes it's really a malformed log with binary data, not an
encoding issue)
I would suggest creating a simple parser module (the type of thing you do for
500 euro or so as sponsored development) that has the simple logic I outline
above. Other people can then expand the options, or sponser additional
conversion options. These could be either separate modules, or config options
within a module. I would tend to do them as separate modules because some
installation may need to use multiple conversions and I think that's going to be
easier to do than to have multiple instances of a single module with different
config parameters.
personally, I expect that everything will end up being 'convert X to UTF8'.
Why re-implementing a translation module doing the same directly in Rsyslog, as
it is handled by PostgreSQL?
Well, obviously this is just a pg solution, so in long term, doing it
generally would make sense IMHO.
I don't see the interest except...
...if the client sending logs to the rsyslog server is so crappy that it does
mix several encoding types
well... I would consider this to be the regular use case in relay chains (for
the upper-level relays). To prevent it, the leaf-level released would need to
do the code translation.
Exactly, once the logs from multiple boxes get combined into one logstream, it
becomes very hard to know what the 'right' thing to do is. You may have this
with a single system if different applications on the system have different
encodings.
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.