On Fri, 22 Jan 2010, Rainer Gerhards wrote: > However, even then I need to have a build time switch to turn this on/off, > because rsyslog in Unicode mode will take not only considerably more space > (especially with larger in-memory queues), it will also considerably affect > its performance (in terms of bytes, the memory transfer rate is effectively > cut in half, as most data in syslog is character-based - also think about the > effects on cache performance).
if the code uses UTF-8 throughout this doesn't make sense. assuming the input is plain ascii, UTF-8 strings and ASCII strings should be the same size (there is some additional cpu cycles involved to figure out the length in characters for any output routines that grab substrings, but that should be all) the only way things would take double the space (and therefor halve the memory transfer rate) is if it converts everything to UTF-16 strings internally. This is a bad idea to start with as UTF-16 does not handle all characters (which is why there is UTF-32 as well), but also because UTF-16 is significantly more expensive to store/copy/etc than UTF-8 for the common case where most of the characters are ASCII. It may be that you have picked the wrong string library to use. prior to UTF-8 being defined 'unicode' and UTF-16 were basicly synonomous and a _lot_ of string libraries have been written with this assumption (converting everything to UTF-16 on input and to whatever on output). If you can find one that can handle the strings as UTF-8 internally it should be able to just about eliminate the overhead. David Lang _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

