Hi all, I am taking up on Chris' call for syslog internationalization.
If we look into languages with a huge character set (e.g. Japanese, Chineese and Korean), we obviously need to encode this characters with more than a single byte (octet, to be precise). Depending on the encoding, between 2 and 5 bytes are needed for a single character. Obviously, we have some issues with the encoding as this is not printable in an US ANSI sense. But let's postpone this discussion. I would just like to look at the message *size*. It is fair to say that we need at least twice as much bytes as with US ANSI. Thus, the usable message size drops dramatically to around 500 characters. If we look at the actual encoding, it can get even worse: one approach (though probably not a clever one, not really thought this out yet) could be to use base64 encoding on 8bit character streams. That would fit nicely in "traditional" and current RFC interop. BUT with bass64, we have even lengthier messages and as such the usable "character message size" (payload) would probably be reduced to a point where it is simply unsuable (read: too short to do something useful). Thus it would make sense to allow for larger syslog message, BUT - we run into interop issues with existing syslog implementations / RFCs - this raises UDP fragmentation concerns which can mess up the whole syslog message I have no good answer on how this could be solved. The only thing I have on my mind is that we introduce a new larger max size that the message MUST NOT exceed and recommend that the size SHOULD be less than the current limit. Then, we could suggest that an implementor SHOULD add a user-configurable option to allow large syslog packets - and probably also an option NOT to disregard "oversize" packets (according to the existing RFCs/IDs). In this scenario, a syslog client SHOULD also be configurable to emit messages with the current max length - this would most probably mean message truncation. An alternative to truncation would be to introduce some fragmentation at the syslog level, but I do not "feel" really good about this. I think the length restriction on syslog messages - and how to slove it - severely impacts the possible message encoding and other parts of "international syslog". As such, I think it makes sense to sort out these things before proceeding into any payload issues. Comments are highly appreciated. Hopefully I am overlooking something obvious. Rainer Gerhards
