On Wed, 27 Jan 2010, Jonathan Bond-Caron wrote:
> On Mon Jan 25 03:12 AM, Rainer Gerhards wrote:
>> So I don't think it would serve the non-US-ASCII world well to process
>> the transformation formats. I guess that's a good option if you have a
>> US-ASCII based system that only very occasionally needs to process a
>> foreign language string (and even then, you need to parse the message
>> *each* time you access it, specifically when obtaining substrings...).
>>
>> My conclusion is that rsyslog needs to do a UTF to UCS conversion on
>> entry to the system and then uses UCS internally (and converts back
>> when messages are output). Many software systems do so, and, as I
>> said, IMHO do so for good reasons.
>>
>
> What about adding a property option ~ 'normalize-utf8' where invalid utf8
> bytes would be escaped?
>
> $template dbFormat,"insert into text_logs (utf8_message) values
> ('%msg:::normalize-utf8%')",stdsql
>
> I can probably dig through postgresql to find the code to detect invalid
> utf8 bytes.
Rainer just added a property option to escape characters > 127. you could
probably take that patch and basicly clone it to make a version that only
escapes things if they aren't valid UTF8 instead.
> I'm not sure if I understood but are you suggesting that all input to
> rsyslog is converted to UCS internally?
> That seems like a huge performance penalty to pay when most people (?) log
> US-ascii or UTF-8 data.
right now rsyslog doesn't do any unicode stuff, it treats everything as a
string of bytes (with some code to escape specific characters). He is
saying that the path he has been planning to take would convert everything
to UCS internally. you saw my argument against that.
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com