Re: Sun's Java encodings vs IANA's character set registry

Markus Scherer Fri, 13 Apr 2001 12:43:10 -0700

It looks to me like the "Cp" names might be IBM CCSIDs. For those, have a look at the 
"ibm-" names in ICU's alias table at 
http://oss.software.ibm.com/cvs/icu/~checkout~/icu/data/convrtrs.txt

Note that ICU uses "cp" to mean Microsoft codepage numbers.

Note also that even IBM changes some of its tables over time and has in a few dozen 
cases multiple Unicode<->codepage tables per CCSID (see our entries for ibm-943 and 
ibm-1363).

"Haphazard" is a good description of the situation...
It is easy to have "repertoires" - the hard part is to have "one repertoire". The 
situation is beyond repair, although we (ICU) are still collecting and publishing 
data. Use Unicode, UTFs, SCSU.

markus

Mike Brown wrote:
...
> I should not be surprised by your statement, but I am. It is distressing to
> think that something that by definition should not be rocket science --
> repertoires of abstract characters mapped directly to specific bit patterns
> -- would be subject to such haphazard definition and even more haphazard
> implementation.

Re: Sun's Java encodings vs IANA's character set registry

Reply via email to