RE: Code pages and Unicode

Erkki I Kolehmainen Wed, 24 Aug 2011 23:59:18 -0700

+1

I'm also guilty of pushing through one particular proposal (much to Ken's 
disliking) that I most certainly would no longer even try, but, alas, times 
were different.

Sincerely, Erkki 

-----Alkuperäinen viesti-----
Lähettäjä: [email protected] [mailto:[email protected]] 
Puolesta Asmus Freytag
Lähetetty: 25. elokuuta 2011 9:00
Vastaanottaja: Richard Wordingham
Kopio: Ken Whistler; [email protected]
Aihe: Re: Code pages and Unicode

On 8/24/2011 7:45 PM, Richard Wordingham wrote:
>
> Which earlier coding system supported Welsh?  (I'm thinking of 'W WITH
> CIRCUMFLEX', U+0174 and U+0175.)  How was the use of the canonical
> decompositions incompatible with the character encodings of legacy
> systems?  Latin-1 has the same codes as ISO-8859-1, but that's as far
> as having the same codes goes. Was the use of combining jamo
> incompatible with legacy Hangul encodings?

See, how time flies.

Early adopters were interested in 1:1 transcoding, using a single 256 
entry table for an 8-bit character set, with guaranteed predictable 
length. Early designs of Unicode (and 10646) attempted to address these 
concerns, because they promised severe impediments to migration.

Some characters were included as part of the merger, without the same 
rigorous process as is in force for characters today. At that time, 
scuttling the deal over a few characters here or there would not have 
been a reasonable action. So you will always find some "exceptions" to 
many of the principles - which doesn't make them less valid.

> Obviously <D800 D800 000E DC00> is non-conformant with current UTF-16. 
> Remembering that there is a guarantee that there will be no more 
> surrogate points, an extension form has to be non-conformant with 
> current UTF-16! 

And that's the reason why there's no interest in this part of the 
discussion. Nobody will need an extension next Tuesday, or in a decade 
or even in several decades - or ever. Haven't seen an upgrade to Morse 
code recently to handle Unicode, for example. Technology has a way of 
moving on.

So, best thing is to drop this silly discussion, and let those future 
people that might be facing a real *requirement* use their good judgment 
to come to a technical solution appropriate to their time - instead of 
wasting collective cycles of discussion how to make 1990's technology 
work for an unknown future requirement. It's just bad engineering.
> Everyone should know how to extend UTF-8 and UTF-32 to cover the 31-bit
> range.

I disagree (as would anyone with a bit of long-term perspective). Nobody 
needs to look into this for decades, so let it rest.

A./

RE: Code pages and Unicode

Reply via email to