On 05/12/2003 14:01, Philippe Verdy wrote:

...

It's just a shame that what was considered as equivalent in the Korean
standards is considered as canonically distinct (and even compatibility
dictinct) in Unicode. This means that the same exact abstract Korean text
can have two distinct representation in Unicode and there's no way to match
these Unicode representations together. And also that whan mapping Korean
charsets to Unicode, care must be done, before making the mapping, that all
compound jamaos will be used each time it is possible.


Agreed.

If now the text is stored and handled entirely in Unicode without returning
to the KSC standard, you won't have any other tool than just UCA to collate
strings (but collation does not produces strings, just collation weights,
and there's currently no tool to reverse a list of weights back to an
Unicode string...

...

I note the following which is part of the text explaining C10:

All processes and higher-level protocols are required to abide by C10 as a minimum.
However, higher-level protocols may define additional equivalences that do not
constitute modifications under that protocol. For example, a higher-level protocol
may allow a sequence of spaces to be replaced by a single space.

Presumably a higher level protocol could transform Korean text into a standardised form, doing what (in your opinion and mine at least) Unicode normalisation ought to have done.


--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Reply via email to