RE: [Syslog] #5 - character encoding (was: Consensus?)

Shyyunn Lin \(sheranl\) Wed, 30 Nov 2005 12:15:47 -0800

Chris:

I agree with all your points. Recommend an encoding and standard lang
tag, and accept all other encoding and lang specification.

Regards,

Sheran

-----Original Message-----
From: Chris Lonvick (clonvick) 
Sent: Wednesday, November 30, 2005 5:06 AM
To: Shyyunn Lin (sheranl)
Cc: [EMAIL PROTECTED]
Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)

Hi Sheran,

On Tue, 29 Nov 2005, Shyyunn Lin (sheranl) wrote:

> Chris:
>
> I think having SD-ID with [enc="utf-8" lang="English"] may be a good 
> approach. If different language use utf-8 encoding, then "lang=" can 
> distinguish it.

We _should_ be using language codes from RFC 3066.  That specifies ISO
639 language tags.  639-1 has 2 character codes ("en" is English) and
639-2 has 3 characters ("eng" is English).  RFC 3066 will likely be
replaced by the works of the Language Tag Registry Update (ltru) Working
Group.
   http://www.ietf.org/html.charters/ltru-charter.html
They have IDs in the works.  Until those become RFCs we should continue
to reference RFC 3066.

>
> Also want to clarify that you suggest that if the message is in ASCII,

> it will not required SD-ID, but for all other encodings, SD-ID will be

> required.

Yes - that's my suggestion.

>
> Note most other encoding methods already imply the language used, for 
> example, in Chinese, there are several encoding methods, Traditional 
> Chinese used in Taiwan and Hong Kong is Big5, and simplified Chinese 
> used in Mainland China is GBK, so if the message is in traditional 
> Chinese char, it will be shown as [enc="Big5", lang="Traditional 
> Chinese"], a little bit redundant. The Big5 also includes all English 
> char so it can be a mix of Chinese and English.

Good point.  As far as I can tell, "Big5" is not recognized by any
accredited standards developing organization.  It is recognized by the
Ideographic Rapporteur Group (IRG) which reports to the Unicode
consortium.  The recognized way to represent Chinese characters,
traditional and simplified, is through ISO 639-2 with the subcodes to
indicate traditional and simplified for the "zh" _language_.  The ID on
"Tags for Identifying Languages"

   http://www.ietf.org/internet-drafts/draft-ietf-ltru-registry-14.txt

identifies simplified Chinese as "zh-Hans" and traditional Chinese as
"zh-Hant".  Additional subtags could identify a locale such as
"zh-Hant-TW" for Taiwan Chinese in traditional script.  This is from the
"Initial Language Subtag Registry" ID.

http://www.ietf.org/internet-drafts/draft-ietf-ltru-initial-06.txt

I think that we should specify encoding and language tags as
striaghtforward as possible and let others augment syslog-protocol (in
the
future) with other encoding mechanisms.  We can RECOMMEND that encoding
be in UTF-8 and language tags come from RFC 3066.  We can allow that
other encoding and language identifications are acceptable.  In the
worst case, a vendor will have the option of [EMAIL PROTECTED]"something"
[EMAIL PROTECTED]"piglatin"].

Does this work for you?

Thanks,
Chris

>
>
>
> Regards,
>
> Sheran
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Chris Lonvick
> (clonvick)
> Sent: Tuesday, November 29, 2005 10:22 AM
> To: Rainer Gerhards
> Cc: [EMAIL PROTECTED]
> Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)
>
> Hi Rainer,
>
> Why don't we look at it from the other direction?  We could state that

> any encoding is acceptable - for ease-of-use/migration with existing 
> syslog implementations.  It is RECOMMENDED that UTF-8 be used.  When 
> it is used, an SD-ID element will be REQUIRED.  e.g. - [enc="utf-8"
> lang="en"]
>
> Thoughts?
>
> All:  Let's discuss this and close this issue.
>
> Thanks,
> Chris
>
> On Tue, 29 Nov 2005, Rainer Gerhards wrote:
>
>> Chris & WG,
>>
>>>> #5 Character encoding in MSG: due to my proof-of-concept
>>>>   implementation, I have raised the (ugly) question if we need
>>>>   to allow encodings other than UTF-8. Please note that this
>>>>   question arises from needs introduced by e.g. POSIX. So we
>>>>   can't easily argue them away by whishful thinking ;)
>>>>
>>>> Not even discussed yet.
>>>
>>> I haven't reviewed that yet.  However, I'll note that allowing 
>>> different encoding can be accomplished in the future as long as we 
>>> establish a default encoding and a way to identify it in our current

>>> work.
>>
>> I have read a little in the mailing archive. Please note that in 2000

>> it was consensus that the MSG part may contain encodings other then 
>> US-ASCII. Follow this threat:
>>
>> http://www.syslog.cc/ietf/autoarc/msg00127.html
>>
>> This discussion lead to RFC 3164 saying "other encodings MAY be
used".
>> While this was observed behaviour, we need still to be aware that the

>> POSIX (and glibc) API places the restrictions on us that we simply do

>> not know the character encoding used by the application. As such, no 
>> *nix syslogd can be programmed to be compliant to syslog-protocol if 
>> we demand UTF-8 exclusively.
>>
>> I propose that we RECOMMEND UTF-8 that MUST start with the Unicode 
>> Byte Order Mask (BOM) if used. If the MSG part does not start with 
>> the
>
>> BOM, it may be any encoding just as in RFC 3164. I do not see any 
>> alternative to this.
>>
>> Rainer
>>
>> _______________________________________________
>> Syslog mailing list
>> [email protected]
>> https://www1.ietf.org/mailman/listinfo/syslog
>>
>
> _______________________________________________
> Syslog mailing list
> [email protected]
> https://www1.ietf.org/mailman/listinfo/syslog
>

_______________________________________________
Syslog mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/syslog

RE: [Syslog] #5 - character encoding (was: Consensus?)

Reply via email to