Re: [rsyslog] Unicode & rsyslog - was: RE: PostgreSQL: Problems with character encoding

david Fri, 22 Jan 2010 10:20:06 -0800

On Fri, 22 Jan 2010, Rainer Gerhards wrote:

> However, even then I need to have a build time switch to turn this on/off,
> because rsyslog in Unicode mode will take not only considerably more space
> (especially with larger in-memory queues), it will also considerably affect
> its performance (in terms of bytes, the memory transfer rate is effectively
> cut in half, as most data in syslog is character-based - also think about the
> effects on cache performance).


if the code uses UTF-8 throughout this doesn't make sense. assuming the 
input is plain ascii, UTF-8 strings and ASCII strings should be the same 
size (there is some additional cpu cycles involved to figure out the 
length in characters for any output routines that grab substrings, but 
that should be all)

the only way things would take double the space (and therefor halve the 
memory transfer rate) is if it converts everything to UTF-16 strings 
internally. This is a bad idea to start with as UTF-16 does not handle all 
characters (which is why there is UTF-32 as well), but also because UTF-16 
is significantly more expensive to store/copy/etc than UTF-8 for the 
common case where most of the characters are ASCII.

It may be that you have picked the wrong string library to use. prior to 
UTF-8 being defined 'unicode' and UTF-16 were basicly synonomous and a 
_lot_ of string libraries have been written with this assumption 
(converting everything to UTF-16 on input and to whatever on output). If 
you can find one that can handle the strings as UTF-8 internally it should 
be able to just about eliminate the overhead.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: [rsyslog] Unicode & rsyslog - was: RE: PostgreSQL: Problems with character encoding

Reply via email to