Re: [Syslog] #5 - character encoding (was: Consensus?)
Rainer I think I detect an approach I do not agree with, in this and perhaps other issues. You seem to be saying that the (eg POSIX) syslogd must emit perfect syslog messages and is responsible for anything that is wrong with them no matter what it received from the application (I exaggerate slightly). I would say that if the application passes incomprehensible garbage, something criminal or illegal, then it is the application that is at fault; syslogd can only be held responsible if it produces messages that are invalid for the parts over which it has control, eg header syntax. So if syslogd has no idea what the transfer encoding is because the rest of the system does not tell it, then syslogd cannot be held responsible for the absence of a field saying what the transfer encoding actually is. Or put differently, if our RFC specify what the application MUST or SHOULD do, as well as syslogd, then that is ok with me. What syslogd would be responsible for, IMO, would be allowing characters that have a special meaning in the syntax (eg NUL is end of message) appearing unescaped (or otherwise encoded). Whether we have such problems depends on the resolution of other issues, not saying that we have at present. Tom Petch - Original Message - From: Rainer Gerhards [EMAIL PROTECTED] To: Chris Lonvick [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, November 30, 2005 2:48 PM Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) Chris, I fully agree - thanks ;) Rainer -Original Message- From: Chris Lonvick [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 30, 2005 2:39 PM To: Rainer Gerhards Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) Hi Rainer, I believe that we are saying the same thing. :) If there is no indicator of encoding or language then a reciever will not know what it is receiving - just like receivers don't know what they are receiving today. They MAY make an assumption that it is something in US-ASCII (but may be disappointed). If there is an indicator of the encoding and language then the receiver will know exactly what it is. Having an indicator should be RECOMMENDED but not REQUIRED for ease of migration. Is that what we're all saying? Thanks, Chris On Wed, 30 Nov 2005, Rainer Gerhards wrote: Chris, Let's use this email as an example. :) There is no indication that I'm using US-ASCII encoding or that I'm writing in English. I think there actually is. If I am right, the SMTP RFCs require mail text to be US-ASCII. Only via MIME and/or escape characters you can include 8-bit data. For example Müller and Möller might create some problems in some mailers (But I guess my Mail system will encode them with =hexval). Dropping messages with octets 127 in the subject is a common spam protection setting... However, you're able to recieve this and read it. Similarly, you could write an email in German and send it to me. I would still be able to recieve it but I'd have a difficult time parsing the meaning. I'm suggesting that same approach for the transmission of the syslog content. If I really wanted you to know what encoding and language I'm using in an email, I would specify a mime header. syslog senders will continue to pump out whatever encoding and language they've been using and recievers will continue to do their best to parse them. If a vendor wants to get very specific about that, then they will have to use an SD-ID to identify the contents of the message. Here I agree with you. What I was saying is that IF the header says it is US-ASCII, only then we should assume it actually is. If there is no enc SD-ID, then we do not know what it is but can assume ... whatever we assume. Let me phrase it that way: If the message contains [enc=us-ascii lang=en] then the receiver can honestly expect it to be US-ASCII. But if it does not contain any enc the receiver does not know exactly and assume anything it finds useful (may be ASCII, may not). Does this clarify? I somehow have the impression we mean the same thing and I simply do not manage to convey what I intend to ;) Rainer Mit Aufrichtigkeit, Chris On Wed, 30 Nov 2005, Rainer Gerhards wrote: Andrew, Hi Rainer, Why don't we look at it from the other direction? We could state that any encoding is acceptable - for ease-of-use/migration with existing syslog implementations. It is RECOMMENDED that UTF-8 be used. When it is used, an SD-ID element will be REQUIRED. e.g. - [enc=utf-8 lang=en] I like that idea too. So, if no SD-ID encoding element is specified, then we must assume US-ASCII and deal with it accordingly?? I think not. If it is not present, we known that we do not know it. If it is US-ASCII, I would expect something like [enc
RE: [Syslog] #5 - character encoding (was: Consensus?)
Sheran, Also want to clarify that you suggest that if the message is in ASCII, it will not required SD-ID, but for all other encodings, SD-ID will be required. Unfortunately, we can not do this. If we would know the encoding, we could translate it to UTF-8, as so far is required by syslog-protocol. However, we often do not know which encoding it is. The reason is that the POSIX syslog API does not tell us. So if we want to support POSIX (which I think we must), we must allow a syslog sender to send messages without telling the encoding - simply because it has no way to obtain that knowledge. A syslog sender embedded e.g. in a device does probably not have this restriction. So it SHOULD encode in UTF-8. That will ensure the receiver can understand it. If the sender has absolutely no idea of how to do that, but knows the encoding, then (and only then) it SHOULD specify the encoding. Rainer Note most other encoding methods already imply the language used, for example, in Chinese, there are several encoding methods, Traditional Chinese used in Taiwan and Hong Kong is Big5, and simplified Chinese used in Mainland China is GBK, so if the message is in traditional Chinese char, it will be shown as [enc=Big5, lang=Traditional Chinese], a little bit redundant. The Big5 also includes all English char so it can be a mix of Chinese and English. Regards, Sheran -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Lonvick (clonvick) Sent: Tuesday, November 29, 2005 10:22 AM To: Rainer Gerhards Cc: [EMAIL PROTECTED] Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) Hi Rainer, Why don't we look at it from the other direction? We could state that any encoding is acceptable - for ease-of-use/migration with existing syslog implementations. It is RECOMMENDED that UTF-8 be used. When it is used, an SD-ID element will be REQUIRED. e.g. - [enc=utf-8 lang=en] Thoughts? All: Let's discuss this and close this issue. Thanks, Chris On Tue, 29 Nov 2005, Rainer Gerhards wrote: Chris WG, #5 Character encoding in MSG: due to my proof-of-concept implementation, I have raised the (ugly) question if we need to allow encodings other than UTF-8. Please note that this question arises from needs introduced by e.g. POSIX. So we can't easily argue them away by whishful thinking ;) Not even discussed yet. I haven't reviewed that yet. However, I'll note that allowing different encoding can be accomplished in the future as long as we establish a default encoding and a way to identify it in our current work. I have read a little in the mailing archive. Please note that in 2000 it was consensus that the MSG part may contain encodings other then US-ASCII. Follow this threat: http://www.syslog.cc/ietf/autoarc/msg00127.html This discussion lead to RFC 3164 saying other encodings MAY be used. While this was observed behaviour, we need still to be aware that the POSIX (and glibc) API places the restrictions on us that we simply do not know the character encoding used by the application. As such, no *nix syslogd can be programmed to be compliant to syslog-protocol if we demand UTF-8 exclusively. I propose that we RECOMMEND UTF-8 that MUST start with the Unicode Byte Order Mask (BOM) if used. If the MSG part does not start with the BOM, it may be any encoding just as in RFC 3164. I do not see any alternative to this. Rainer ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
Hi Rainer, Why don't we look at it from the other direction? We could state that any encoding is acceptable - for ease-of-use/migration with existing syslog implementations. It is RECOMMENDED that UTF-8 be used. When it is used, an SD-ID element will be REQUIRED. e.g. - [enc=utf-8 lang=en] I like that idea too. So, if no SD-ID encoding element is specified, then we must assume US-ASCII and deal with it accordingly?? Cheers Andrew ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
Hi Sheran, On Tue, 29 Nov 2005, Shyyunn Lin (sheranl) wrote: Chris: I think having SD-ID with [enc=utf-8 lang=English] may be a good approach. If different language use utf-8 encoding, then lang= can distinguish it. We _should_ be using language codes from RFC 3066. That specifies ISO 639 language tags. 639-1 has 2 character codes (en is English) and 639-2 has 3 characters (eng is English). RFC 3066 will likely be replaced by the works of the Language Tag Registry Update (ltru) Working Group. http://www.ietf.org/html.charters/ltru-charter.html They have IDs in the works. Until those become RFCs we should continue to reference RFC 3066. Also want to clarify that you suggest that if the message is in ASCII, it will not required SD-ID, but for all other encodings, SD-ID will be required. Yes - that's my suggestion. Note most other encoding methods already imply the language used, for example, in Chinese, there are several encoding methods, Traditional Chinese used in Taiwan and Hong Kong is Big5, and simplified Chinese used in Mainland China is GBK, so if the message is in traditional Chinese char, it will be shown as [enc=Big5, lang=Traditional Chinese], a little bit redundant. The Big5 also includes all English char so it can be a mix of Chinese and English. Good point. As far as I can tell, Big5 is not recognized by any accredited standards developing organization. It is recognized by the Ideographic Rapporteur Group (IRG) which reports to the Unicode consortium. The recognized way to represent Chinese characters, traditional and simplified, is through ISO 639-2 with the subcodes to indicate traditional and simplified for the zh _language_. The ID on Tags for Identifying Languages http://www.ietf.org/internet-drafts/draft-ietf-ltru-registry-14.txt identifies simplified Chinese as zh-Hans and traditional Chinese as zh-Hant. Additional subtags could identify a locale such as zh-Hant-TW for Taiwan Chinese in traditional script. This is from the Initial Language Subtag Registry ID. http://www.ietf.org/internet-drafts/draft-ietf-ltru-initial-06.txt I think that we should specify encoding and language tags as striaghtforward as possible and let others augment syslog-protocol (in the future) with other encoding mechanisms. We can RECOMMEND that encoding be in UTF-8 and language tags come from RFC 3066. We can allow that other encoding and language identifications are acceptable. In the worst case, a vendor will have the option of [EMAIL PROTECTED]something [EMAIL PROTECTED]piglatin]. Does this work for you? Thanks, Chris Regards, Sheran -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Lonvick (clonvick) Sent: Tuesday, November 29, 2005 10:22 AM To: Rainer Gerhards Cc: [EMAIL PROTECTED] Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) Hi Rainer, Why don't we look at it from the other direction? We could state that any encoding is acceptable - for ease-of-use/migration with existing syslog implementations. It is RECOMMENDED that UTF-8 be used. When it is used, an SD-ID element will be REQUIRED. e.g. - [enc=utf-8 lang=en] Thoughts? All: Let's discuss this and close this issue. Thanks, Chris On Tue, 29 Nov 2005, Rainer Gerhards wrote: Chris WG, #5 Character encoding in MSG: due to my proof-of-concept implementation, I have raised the (ugly) question if we need to allow encodings other than UTF-8. Please note that this question arises from needs introduced by e.g. POSIX. So we can't easily argue them away by whishful thinking ;) Not even discussed yet. I haven't reviewed that yet. However, I'll note that allowing different encoding can be accomplished in the future as long as we establish a default encoding and a way to identify it in our current work. I have read a little in the mailing archive. Please note that in 2000 it was consensus that the MSG part may contain encodings other then US-ASCII. Follow this threat: http://www.syslog.cc/ietf/autoarc/msg00127.html This discussion lead to RFC 3164 saying other encodings MAY be used. While this was observed behaviour, we need still to be aware that the POSIX (and glibc) API places the restrictions on us that we simply do not know the character encoding used by the application. As such, no *nix syslogd can be programmed to be compliant to syslog-protocol if we demand UTF-8 exclusively. I propose that we RECOMMEND UTF-8 that MUST start with the Unicode Byte Order Mask (BOM) if used. If the MSG part does not start with the BOM, it may be any encoding just as in RFC 3164. I do not see any alternative to this. Rainer ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
Chris, I agree to all but one point - only that one quoted here... Also want to clarify that you suggest that if the message is in ASCII, it will not required SD-ID, but for all other encodings, SD-ID will be required. Yes - that's my suggestion. I am sorry, we can not do this. The whole issue is rooted in POSIX APIs. You need to look at it why it is such a problem. On Windows, you know what character encodings you are dealing with. On Unix, you actually just get a bunch of octets - and nobody tells you what it is. So the poor Unix syslogd actually has no idea of what it handles and likewise does not know what to place in that field ;) If it knew it were this or that encoding, I would be very tempted to request it to convert to UTF-8. But the need behind this encoding is *NOT* to allow the multitude of whatever currently is in existence but rather provide a way to let a syslogd that needs to omit a bunch of octets do that. Does this clarify? I can provide code if that would be helpful... Rainer ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
Hi Rainer, I believe that we are saying the same thing. :) If there is no indicator of encoding or language then a reciever will not know what it is receiving - just like receivers don't know what they are receiving today. They MAY make an assumption that it is something in US-ASCII (but may be disappointed). If there is an indicator of the encoding and language then the receiver will know exactly what it is. Having an indicator should be RECOMMENDED but not REQUIRED for ease of migration. Is that what we're all saying? Thanks, Chris On Wed, 30 Nov 2005, Rainer Gerhards wrote: Chris, Let's use this email as an example. :) There is no indication that I'm using US-ASCII encoding or that I'm writing in English. I think there actually is. If I am right, the SMTP RFCs require mail text to be US-ASCII. Only via MIME and/or escape characters you can include 8-bit data. For example Müller and Möller might create some problems in some mailers (But I guess my Mail system will encode them with =hexval). Dropping messages with octets 127 in the subject is a common spam protection setting... However, you're able to recieve this and read it. Similarly, you could write an email in German and send it to me. I would still be able to recieve it but I'd have a difficult time parsing the meaning. I'm suggesting that same approach for the transmission of the syslog content. If I really wanted you to know what encoding and language I'm using in an email, I would specify a mime header. syslog senders will continue to pump out whatever encoding and language they've been using and recievers will continue to do their best to parse them. If a vendor wants to get very specific about that, then they will have to use an SD-ID to identify the contents of the message. Here I agree with you. What I was saying is that IF the header says it is US-ASCII, only then we should assume it actually is. If there is no enc SD-ID, then we do not know what it is but can assume ... whatever we assume. Let me phrase it that way: If the message contains [enc=us-ascii lang=en] then the receiver can honestly expect it to be US-ASCII. But if it does not contain any enc the receiver does not know exactly and assume anything it finds useful (may be ASCII, may not). Does this clarify? I somehow have the impression we mean the same thing and I simply do not manage to convey what I intend to ;) Rainer Mit Aufrichtigkeit, Chris On Wed, 30 Nov 2005, Rainer Gerhards wrote: Andrew, Hi Rainer, Why don't we look at it from the other direction? We could state that any encoding is acceptable - for ease-of-use/migration with existing syslog implementations. It is RECOMMENDED that UTF-8 be used. When it is used, an SD-ID element will be REQUIRED. e.g. - [enc=utf-8 lang=en] I like that idea too. So, if no SD-ID encoding element is specified, then we must assume US-ASCII and deal with it accordingly?? I think not. If it is not present, we known that we do not know it. If it is US-ASCII, I would expect something like [enc=us-ascii lang=en] Of course, we could also say if it is non-present, we can assume US-ASCII. But then we would need to introduce [enc=unknown] for the (common) case where we simply do not know it (again: think POSIX). I find this somehwat confusing. Rainer ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
Chris, I fully agree - thanks ;) Rainer -Original Message- From: Chris Lonvick [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 30, 2005 2:39 PM To: Rainer Gerhards Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) Hi Rainer, I believe that we are saying the same thing. :) If there is no indicator of encoding or language then a reciever will not know what it is receiving - just like receivers don't know what they are receiving today. They MAY make an assumption that it is something in US-ASCII (but may be disappointed). If there is an indicator of the encoding and language then the receiver will know exactly what it is. Having an indicator should be RECOMMENDED but not REQUIRED for ease of migration. Is that what we're all saying? Thanks, Chris On Wed, 30 Nov 2005, Rainer Gerhards wrote: Chris, Let's use this email as an example. :) There is no indication that I'm using US-ASCII encoding or that I'm writing in English. I think there actually is. If I am right, the SMTP RFCs require mail text to be US-ASCII. Only via MIME and/or escape characters you can include 8-bit data. For example Müller and Möller might create some problems in some mailers (But I guess my Mail system will encode them with =hexval). Dropping messages with octets 127 in the subject is a common spam protection setting... However, you're able to recieve this and read it. Similarly, you could write an email in German and send it to me. I would still be able to recieve it but I'd have a difficult time parsing the meaning. I'm suggesting that same approach for the transmission of the syslog content. If I really wanted you to know what encoding and language I'm using in an email, I would specify a mime header. syslog senders will continue to pump out whatever encoding and language they've been using and recievers will continue to do their best to parse them. If a vendor wants to get very specific about that, then they will have to use an SD-ID to identify the contents of the message. Here I agree with you. What I was saying is that IF the header says it is US-ASCII, only then we should assume it actually is. If there is no enc SD-ID, then we do not know what it is but can assume ... whatever we assume. Let me phrase it that way: If the message contains [enc=us-ascii lang=en] then the receiver can honestly expect it to be US-ASCII. But if it does not contain any enc the receiver does not know exactly and assume anything it finds useful (may be ASCII, may not). Does this clarify? I somehow have the impression we mean the same thing and I simply do not manage to convey what I intend to ;) Rainer Mit Aufrichtigkeit, Chris On Wed, 30 Nov 2005, Rainer Gerhards wrote: Andrew, Hi Rainer, Why don't we look at it from the other direction? We could state that any encoding is acceptable - for ease-of-use/migration with existing syslog implementations. It is RECOMMENDED that UTF-8 be used. When it is used, an SD-ID element will be REQUIRED. e.g. - [enc=utf-8 lang=en] I like that idea too. So, if no SD-ID encoding element is specified, then we must assume US-ASCII and deal with it accordingly?? I think not. If it is not present, we known that we do not know it. If it is US-ASCII, I would expect something like [enc=us-ascii lang=en] Of course, we could also say if it is non-present, we can assume US-ASCII. But then we would need to introduce [enc=unknown] for the (common) case where we simply do not know it (again: think POSIX). I find this somehwat confusing. Rainer ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
Chris: I agree with all your points. Recommend an encoding and standard lang tag, and accept all other encoding and lang specification. Regards, Sheran -Original Message- From: Chris Lonvick (clonvick) Sent: Wednesday, November 30, 2005 5:06 AM To: Shyyunn Lin (sheranl) Cc: [EMAIL PROTECTED] Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) Hi Sheran, On Tue, 29 Nov 2005, Shyyunn Lin (sheranl) wrote: Chris: I think having SD-ID with [enc=utf-8 lang=English] may be a good approach. If different language use utf-8 encoding, then lang= can distinguish it. We _should_ be using language codes from RFC 3066. That specifies ISO 639 language tags. 639-1 has 2 character codes (en is English) and 639-2 has 3 characters (eng is English). RFC 3066 will likely be replaced by the works of the Language Tag Registry Update (ltru) Working Group. http://www.ietf.org/html.charters/ltru-charter.html They have IDs in the works. Until those become RFCs we should continue to reference RFC 3066. Also want to clarify that you suggest that if the message is in ASCII, it will not required SD-ID, but for all other encodings, SD-ID will be required. Yes - that's my suggestion. Note most other encoding methods already imply the language used, for example, in Chinese, there are several encoding methods, Traditional Chinese used in Taiwan and Hong Kong is Big5, and simplified Chinese used in Mainland China is GBK, so if the message is in traditional Chinese char, it will be shown as [enc=Big5, lang=Traditional Chinese], a little bit redundant. The Big5 also includes all English char so it can be a mix of Chinese and English. Good point. As far as I can tell, Big5 is not recognized by any accredited standards developing organization. It is recognized by the Ideographic Rapporteur Group (IRG) which reports to the Unicode consortium. The recognized way to represent Chinese characters, traditional and simplified, is through ISO 639-2 with the subcodes to indicate traditional and simplified for the zh _language_. The ID on Tags for Identifying Languages http://www.ietf.org/internet-drafts/draft-ietf-ltru-registry-14.txt identifies simplified Chinese as zh-Hans and traditional Chinese as zh-Hant. Additional subtags could identify a locale such as zh-Hant-TW for Taiwan Chinese in traditional script. This is from the Initial Language Subtag Registry ID. http://www.ietf.org/internet-drafts/draft-ietf-ltru-initial-06.txt I think that we should specify encoding and language tags as striaghtforward as possible and let others augment syslog-protocol (in the future) with other encoding mechanisms. We can RECOMMEND that encoding be in UTF-8 and language tags come from RFC 3066. We can allow that other encoding and language identifications are acceptable. In the worst case, a vendor will have the option of [EMAIL PROTECTED]something [EMAIL PROTECTED]piglatin]. Does this work for you? Thanks, Chris Regards, Sheran -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Lonvick (clonvick) Sent: Tuesday, November 29, 2005 10:22 AM To: Rainer Gerhards Cc: [EMAIL PROTECTED] Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) Hi Rainer, Why don't we look at it from the other direction? We could state that any encoding is acceptable - for ease-of-use/migration with existing syslog implementations. It is RECOMMENDED that UTF-8 be used. When it is used, an SD-ID element will be REQUIRED. e.g. - [enc=utf-8 lang=en] Thoughts? All: Let's discuss this and close this issue. Thanks, Chris On Tue, 29 Nov 2005, Rainer Gerhards wrote: Chris WG, #5 Character encoding in MSG: due to my proof-of-concept implementation, I have raised the (ugly) question if we need to allow encodings other than UTF-8. Please note that this question arises from needs introduced by e.g. POSIX. So we can't easily argue them away by whishful thinking ;) Not even discussed yet. I haven't reviewed that yet. However, I'll note that allowing different encoding can be accomplished in the future as long as we establish a default encoding and a way to identify it in our current work. I have read a little in the mailing archive. Please note that in 2000 it was consensus that the MSG part may contain encodings other then US-ASCII. Follow this threat: http://www.syslog.cc/ietf/autoarc/msg00127.html This discussion lead to RFC 3164 saying other encodings MAY be used. While this was observed behaviour, we need still to be aware that the POSIX (and glibc) API places the restrictions on us that we simply do not know the character encoding used by the application. As such, no *nix syslogd can be programmed to be compliant to syslog-protocol if we demand UTF-8 exclusively. I propose that we RECOMMEND UTF-8 that MUST start with the Unicode Byte Order Mask (BOM) if used. If the MSG
RE: [Syslog] #5 - character encoding (was: Consensus?)
Chris: I think having SD-ID with [enc=utf-8 lang=English] may be a good approach. If different language use utf-8 encoding, then lang= can distinguish it. Also want to clarify that you suggest that if the message is in ASCII, it will not required SD-ID, but for all other encodings, SD-ID will be required. Note most other encoding methods already imply the language used, for example, in Chinese, there are several encoding methods, Traditional Chinese used in Taiwan and Hong Kong is Big5, and simplified Chinese used in Mainland China is GBK, so if the message is in traditional Chinese char, it will be shown as [enc=Big5, lang=Traditional Chinese], a little bit redundant. The Big5 also includes all English char so it can be a mix of Chinese and English. Regards, Sheran -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Lonvick (clonvick) Sent: Tuesday, November 29, 2005 10:22 AM To: Rainer Gerhards Cc: [EMAIL PROTECTED] Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) Hi Rainer, Why don't we look at it from the other direction? We could state that any encoding is acceptable - for ease-of-use/migration with existing syslog implementations. It is RECOMMENDED that UTF-8 be used. When it is used, an SD-ID element will be REQUIRED. e.g. - [enc=utf-8 lang=en] Thoughts? All: Let's discuss this and close this issue. Thanks, Chris On Tue, 29 Nov 2005, Rainer Gerhards wrote: Chris WG, #5 Character encoding in MSG: due to my proof-of-concept implementation, I have raised the (ugly) question if we need to allow encodings other than UTF-8. Please note that this question arises from needs introduced by e.g. POSIX. So we can't easily argue them away by whishful thinking ;) Not even discussed yet. I haven't reviewed that yet. However, I'll note that allowing different encoding can be accomplished in the future as long as we establish a default encoding and a way to identify it in our current work. I have read a little in the mailing archive. Please note that in 2000 it was consensus that the MSG part may contain encodings other then US-ASCII. Follow this threat: http://www.syslog.cc/ietf/autoarc/msg00127.html This discussion lead to RFC 3164 saying other encodings MAY be used. While this was observed behaviour, we need still to be aware that the POSIX (and glibc) API places the restrictions on us that we simply do not know the character encoding used by the application. As such, no *nix syslogd can be programmed to be compliant to syslog-protocol if we demand UTF-8 exclusively. I propose that we RECOMMEND UTF-8 that MUST start with the Unicode Byte Order Mask (BOM) if used. If the MSG part does not start with the BOM, it may be any encoding just as in RFC 3164. I do not see any alternative to this. Rainer ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog