RE: syslog-int
Hi Anton, Now the second response ;) First of all, I will revise the ID in respect to ftp://ftp.rfc-editor.org/in-notes/rfc3536.txt, which I initially did not pay proper attention to. This may bring a number of changes. I can not do it this week, but most probably have done so by the end of next week. 1. I think it was a good idea to put language headers into the MSG part as opposed to other syslog fields. This would allow me, for example, storing the message on disk in the same format and/or re-use the language header even when bypassing syslog. This is one of the intentions. It leaves one problem, though - the hostname may include international fields. But I think we can solve this with DNS internationalization. I didn't manage to get this into the -00 ID but it may be worth talking a little bit about the header. As of now, it is left out, which could cause some issues. On the other hand, the current format makes syslog-international highly compatible and integratable within an existing syslog infrastructure. 2. The LANGUAGE field. Is it necessary? Can it be omitted? I can easily see a situation where an application supporting foreign languages does not know the source language. For example, my applications takes usernames in any language (in Unicode). Now, I need to produce a syslog message which has the username in it. What am I going to put in the LANGUAGE field? Good point. As you mentioned, it is in because of RFC2277. I think it must be a user-configurable option. Maybe it should be - if the emiting syslog client does not know... Any more comments anyone? Also, there can be situations (bizarre as they are) when a message contains multiple languages in it. With Unicode-based encodings, it is not a problem. What happens in this case? I'd propose removing the LANGUAGE field. If I think about it, it is actually not bizarre - think of e.g. Japanese or French names inside an English message (or substrings). What does the rest of the WG think? 3. RFC1766 referenced in the LANGUAGE field spec is officially obsolete. Need to address it one way or another if the LANGUAGE field stays? 4.3 of RFC2277 says that it should be used - this is why it is in. 4. I am concerned about attempting to support ALL encodings and ALL charsets. Is it necessary? If this is a new standard, why do we need to provide for support of all legacy encodings? Why can't we just standardize on Unicode encoded as straight binary UTF-8 or UTF-7? I know there are some issues with Unicode acceptance and there may also be some missing Unicode libs... As such, I would like to see (for Japan) at least EUC and probably (S)JIS supported. In European languages, it is the same thing. Maybe also provide for support of UTF-16 for more compact representation of Asian languages. Either one will cover all languages and would not require getting into the business of interpreting locale/language specific charsets and many encodings. Of course, we can also specify that all should be Unicode based and the syslog client/collector needs to do the conversion. I just have the feeling that this puts a lot of burden on the syslog side which again will hinder implementation. But of course I am more than willing to change this if there is enough support in this WG for plain Unicode... If we go Unicode, we can only specify the UTF-7 encoding for the reasons stated above. As of now, we would have a single encoding allowed. So, can we do away with the CHARSET field and only use Unicode-based and US-ASCII encodings in the ENCODING field? 5. Even if we support multiple encodings, it would probably help to specify a few that MUST be supported. Otherwise, how do we ensure interoperability if each implementation can choose its own set of supported encodings/charsets? 6. Can we specify that the absence of the header should be interpreted as UTF-8 encoded message? A UTF-8 compliant syslog daemon will be backwards compatible as it would be able to receive message sent in plain US-ASCII as this range of characters is encoded identically. See above for UTF-8. I really think a lot of the discussion has been and will focus on the 8 bit char transport. I would like to not re-specify the syslog frame format, as this removes a lot of the ease of implementation. I also think that it really does not belong in here - but I may be wrong. I also see a lot of issues if a syslog-international enabled syslog client needs to talk to a e.g. RFC3195 syslogd not supporting syslog-international. There is no defined way of downgrading... Mmmhhh... Well, we could specify that it must send in UTF-7 in this case (should be readable in the old syslogd) but if it knows it will be talking to a syslog-international enabled server, it will use UTF-8. Does this sound like a solution? I still have the difficulty that I am somehow mixing up the layers to solve this issue, but this may be not a real concern... Comments, anyone?
RE: syslog-int
Hi! I assume this is the forum to provide the feedback on this draft. Title : Syslog-international Protocol Author(s) : R. Gerhards Filename: draft-ietf-syslog-international-00.txt Pages : 13 Date: 2003-8-1 http://www.ietf.org/internet-drafts/draft-ietf-syslog-international-00 .txt First of all, thanks for putting the draft together! 1. I think it was a good idea to put language headers into the MSG part as opposed to other syslog fields. This would allow me, for example, storing the message on disk in the same format and/or re-use the language header even when bypassing syslog. 2. The LANGUAGE field. Is it necessary? Can it be omitted? I can easily see a situation where an application supporting foreign languages does not know the source language. For example, my applications takes usernames in any language (in Unicode). Now, I need to produce a syslog message which has the username in it. What am I going to put in the LANGUAGE field? Also, there can be situations (bizarre as they are) when a message contains multiple languages in it. With Unicode-based encodings, it is not a problem. What happens in this case? I'd propose removing the LANGUAGE field. 3. RFC1766 referenced in the LANGUAGE field spec is officially obsolete. Need to address it one way or another if the LANGUAGE field stays? 4. I am concerned about attempting to support ALL encodings and ALL charsets. Is it necessary? If this is a new standard, why do we need to provide for support of all legacy encodings? Why can't we just standardize on Unicode encoded as straight binary UTF-8 or UTF-7? Maybe also provide for support of UTF-16 for more compact representation of Asian languages. Either one will cover all languages and would not require getting into the business of interpreting locale/language specific charsets and many encodings. So, can we do away with the CHARSET field and only use Unicode-based and US-ASCII encodings in the ENCODING field? 5. Even if we support multiple encodings, it would probably help to specify a few that MUST be supported. Otherwise, how do we ensure interoperability if each implementation can choose its own set of supported encodings/charsets? 6. Can we specify that the absence of the header should be interpreted as UTF-8 encoded message? A UTF-8 compliant syslog daemon will be backwards compatible as it would be able to receive message sent in plain US-ASCII as this range of characters is encoded identically. 7. Just a minor thing. The spec and examples show the special prefix as @#i18n, but there is a note in the spec that the i should be capital I. Why not change it in the examples then? 8. With internationalization of messages, the length of messages will inevitably grow. Should we provide for multi-part messages to overcome the 1024 byte limit? Multi-part messages could have the same standard syslog header along with some additional part-seq-number. I guess this also required msg-seq-number. If we don't deal with multi-part messages as part of syslog RFC, applications will invent their own mechanisms to identify messages that belong together. A standard mechanism would be nice. Thanks! Anton.
RE: syslog-int
RFC2277 IETF Policy on Character Sets and Languages could be of use for syslog-international draft: ftp://ftp.isi.edu/in-notes/rfc2277.txt Of note: - It clearly favors UTF-8. Does not mention UTF-7. - It seems to argue that language must be identified. Anton. -Original Message- From: Anton Okmianski [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 12, 2003 11:30 AM To: [EMAIL PROTECTED] Cc: Nathan Sowatskey; David Bainbridge Subject: RE: syslog-int Hi! I assume this is the forum to provide the feedback on this draft. Title : Syslog-international Protocol Author(s) : R. Gerhards Filename: draft-ietf-syslog-international-00.txt Pages : 13 Date: 2003-8-1 http://www.ietf.org/internet-drafts/draft-ietf-syslog-intern ational-00.txt First of all, thanks for putting the draft together! 1. I think it was a good idea to put language headers into the MSG part as opposed to other syslog fields. This would allow me, for example, storing the message on disk in the same format and/or re-use the language header even when bypassing syslog. 2. The LANGUAGE field. Is it necessary? Can it be omitted? I can easily see a situation where an application supporting foreign languages does not know the source language. For example, my applications takes usernames in any language (in Unicode). Now, I need to produce a syslog message which has the username in it. What am I going to put in the LANGUAGE field? Also, there can be situations (bizarre as they are) when a message contains multiple languages in it. With Unicode-based encodings, it is not a problem. What happens in this case? I'd propose removing the LANGUAGE field. 3. RFC1766 referenced in the LANGUAGE field spec is officially obsolete. Need to address it one way or another if the LANGUAGE field stays? 4. I am concerned about attempting to support ALL encodings and ALL charsets. Is it necessary? If this is a new standard, why do we need to provide for support of all legacy encodings? Why can't we just standardize on Unicode encoded as straight binary UTF-8 or UTF-7? Maybe also provide for support of UTF-16 for more compact representation of Asian languages. Either one will cover all languages and would not require getting into the business of interpreting locale/language specific charsets and many encodings. So, can we do away with the CHARSET field and only use Unicode-based and US-ASCII encodings in the ENCODING field? 5. Even if we support multiple encodings, it would probably help to specify a few that MUST be supported. Otherwise, how do we ensure interoperability if each implementation can choose its own set of supported encodings/charsets? 6. Can we specify that the absence of the header should be interpreted as UTF-8 encoded message? A UTF-8 compliant syslog daemon will be backwards compatible as it would be able to receive message sent in plain US-ASCII as this range of characters is encoded identically. 7. Just a minor thing. The spec and examples show the special prefix as @#i18n, but there is a note in the spec that the i should be capital I. Why not change it in the examples then? 8. With internationalization of messages, the length of messages will inevitably grow. Should we provide for multi-part messages to overcome the 1024 byte limit? Multi-part messages could have the same standard syslog header along with some additional part-seq-number. I guess this also required msg-seq-number. If we don't deal with multi-part messages as part of syslog RFC, applications will invent their own mechanisms to identify messages that belong together. A standard mechanism would be nice. Thanks! Anton.