Can anyone come up with a way that the initial message header can be
read in any character set and still make sense?

If the header contained the message encoding method then we could choose
to encode it in any method we wanted. UUE, Base64, UTF-8 etc. Final
syslog server that writes the data to file or displays messages would
simply decode data into users desired character set. Relays could just
pass the data unaltered.

Bearing in mind that the syslog messages were originally intended to be
read by an operator at some stage. It is no use if they remain encoded
at the end point. :)

Maybe after the <PRI> code, we use a couple of bytes to indicate the
message encoding type. Better still, make it the first two bytes that
determine the whole message encoding system. FF 01 = uue, FF 02=UTF-8
etc. Avoid using any 00 bytes so as not to confuse C strings.

My 2c worth on TCP vs UDP.

UDP is great for a last minute "I am dying" type message from a device.
TCP would require ACK back and a connection setup if not already
connected. If we go for something like BEEP with the extra handshaking
we are in for a longer wait before sending the message.

Question is, how many "I am dying" type messages do we get vs normal
"user logged in" type messages? Probably not enough to warrant staying
with UDP.

Regards

Andrew



-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Rainer Gerhards
Sent: Monday, 14 July 2003 10:57 p.m.
To: Eric Fitzgerald
Cc: [EMAIL PROTECTED]
Subject: RE: Syslog Internationalization - Message size


Eric,

Thanks for the good pointer.

> Why not use Unicode UCS-2 with UTF-8 encoding?
> http://www.unicode.org/faq/utf_bom.html
> ftp://ftp.rfc-editor.org/in-notes/rfc2279.txt
>
> UTF-8 encoding would be backwards-compatible with ASCII in
> many (most?)
> cases for syslog.

I think there is one issue with UTF-8, that is it requires 8-BIT bytes.
Syslog allows only 7 bit transport. Fortunately, there is UTF-7
available (ftp://ftp.rfc-editor.org/in-notes/rfc2152.txt) which encodes
in plain 7 bit.

I was thinking base64 because it would not only allow unicode data to
travel. E.g. in Japan, a lot of data is encoded not in Unicode but in
traditional encodings, like JIS and EUC. If we use a Unicode-only
encoding, that would not be possible.

On the other hand, UTF-7 has the beauty that almost all plain US ANSI
characters can be transmitted without any encoding. In contrast to base
64 this means the message is human readable even on the wire.

I will note both choices down for now ;)

Rainer




Reply via email to