Re: Compression through normalization

Peter Kirk Fri, 05 Dec 2003 16:34:47 -0800

On 05/12/2003 14:01, Philippe Verdy wrote:

...

It's just a shame that what was considered as equivalent in the Korean standards is considered as canonically distinct (and even compatibility dictinct) in Unicode. This means that the same exact abstract Korean text can have two distinct representation in Unicode and there's no way to match these Unicode representations together. And also that whan mapping Korean charsets to Unicode, care must be done, before making the mapping, that all compound jamaos will be used each time it is possible.

Agreed.

If now the text is stored and handled entirely in Unicode without returning
to the KSC standard, you won't have any other tool than just UCA to collate
strings (but collation does not produces strings, just collation weights,
and there's currently no tool to reverse a list of weights back to an
Unicode string...

...

I note the following which is part of the text explaining C10:

All processes and higher-level protocols are required to abide by C10 as a minimum. However, higher-level protocols may define additional equivalences that do not constitute modifications under that protocol. For example, a higher-level protocol may allow a sequence of spaces to be replaced by a single space.

Presumably a higher level protocol could transform Korean text into a standardised form, doing what (in your opinion and mine at least) Unicode normalisation ought to have done.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/

Re: Compression through normalization

Reply via email to