At 5:36 pm +0100 2/11/03, Lars Marius Garshol wrote:  

>> True. However, some software will ignore hyphens in charset names in
>> order to make bad encoding declarations like "utf8" work properly. Web
>> browsers are one example of this.

First of all, a software which ignore the hyphens in one charset name not necessary ALSO ingore teh hyphens for a different charset. For example, in mozilla/netscape, I build in the http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/charsetalias.properties#459

which map both "iso-8859-1" and "iso88591" to "ISO-8859-1". But we didn't map "eucjp' to "euc-jp". You may THINK it is a "ignore hyphens" but it is really a extra table entries.

Second, even a software support that, or several software support that, that does not mean it is a valid charset name. It only mean the software accept that name in additional to the valid charset names.

John Delacour ([EMAIL PROTECTED]) wrote:

>Yes. If there were any sort of consistency in the way charset are
>âofficiallyâ named, it might be reasonable to stick to the letter
>of the law, but there is not,

There IS a consistency in the way charset are "officially" named- which is whatever listed under

http://www.iana.org/assignments/character-sets

>and the use of âutf8â (either case)
>is so commonly used and allowed for in all sorts of programs (cf.
>Encode.pm) that it would seem sensible to accept it.

We are talking about charset value for the internet protocol here. It is a special narrow field of charset name. The value used by Internet protocol are defined by a well defined process- http://www.faqs.org/rfcs/rfc2278.html  RFC 2278 - IANA Charset Registration Procedures

and have no direct relationship with charset name used by programming languages or operating system. Programming languages and operating system can choose whatever the name they want to use for charset, either adopt whatever register with IANA or not. But that does not mean IANA will or should take those name automatically. IANA will took those name if someone submit those name thorugh the RFC 2278 process and those name fit into the criterias stated in RFC 2278.

there are some good reason for this. We don't want browser or other internet software support any charset name any software produce. We want to reduce the support list to a finite set in a common places that all vendor can reference to. A particlar Perl programmer can choose to use a particular charset name for Perl. That is perfectly fine for his/her Perl. But he/she should not expect the INTERNET developers follow his/her usage unless someone bother to go through the INTERNET way- RFC 2278.

>If âl1â is
>acceptable for âISO-8859-1â, as it is, though it is not in
>Appleâs TEC listing, then âutf8â etc. ought to be fairly
>predictable anomalies.

"L1" is accepted because it is a valid charset name listed in http://www.iana.org/assignments/character-sets
see

 
Name: ISO_8859-1:1987                                    [RFC1345,KXS2]
MIBenum: 4
Source: ECMA registry
Alias: iso-ir-100
Alias: ISO_8859-1
Alias: ISO-8859-1 (preferred MIME name)
Alias: latin1
Alias: l1
Alias: IBM819
Alias: CP819
Alias: csISOLatin1
 
"utf8" is not valid charset name simply because it is not listed 
under http://www.iana.org/assignments/character-sets

>JD

 
==================================
Frank Yung-Fong Tang
System Architect, IÃtÃrnÃtiÃnÃl DÃvÃlÃpmeÃt, AOL IntÃrÃÃtÃvà SÃrviÃes
AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo! Msg: frankyungfongtan

John 3:16 "For God so loved the world that he gave his one and only Son, that whoever believes in him shall not perish but have eternal life.

Does your software display Thai language text correctly for Thailand users?
-> Basic Conceptof Thai Language linked from Frank Tang's IÃtÃrnÃtiÃnÃlizÃtiÃn Secrets
Want to translate your English text to something Thailand users can understand ?
-> Try English-to-Thai machine translation at http://c3po.links.nectec.or.th/parsit/

Reply via email to