Mark Davis writes: > Doug Ewell writes: > > OK. So it's Mark, not me, who is unilaterally extending C10. > > Where on earth do you get that? I did say that, in practice, NFC should be > produced, but that is simply a practical guideline, independent of C10.
I also think that the NFC form is not required for the result of the decompression to respect clause C10. So if your intent is to create a compressor/decompressor that respects canonical equivalence, NFC is not required. Of course clause C10 cannot be fully respected for charset mappings; non-Unicode Korean charsets is one example where canonical equivalence cannot be guaranteed, and where in fact the Unicode codanonical equivalence is a pollution: mappings to/from non-Unicode charsets do not need to respect canonical equivalence, when this non-Unicode charset has its own canonical equivalence rules. It's just a shame that what was considered as equivalent in the Korean standards is considered as canonically distinct (and even compatibility dictinct) in Unicode. This means that the same exact abstract Korean text can have two distinct representation in Unicode and there's no way to match these Unicode representations together. And also that whan mapping Korean charsets to Unicode, care must be done, before making the mapping, that all compound jamaos will be used each time it is possible. If now the text is stored and handled entirely in Unicode without returning to the KSC standard, you won't have any other tool than just UCA to collate strings (but collation does not produces strings, just collation weights, and there's currently no tool to reverse a list of weights back to an Unicode string... ... unless the table of UCA collation weights is built as if it was a bidirectional mapping to a legacy charset, which would then become reversible and usable to perform various Unicode algorithms including case folding, or many other similar foldings defined in UTR... If someone adventures himself to define such collation charset and maps it to Unicode, then he will effectively create as many charset as collation orders tailored for a particuler language. __________________________________________________________________ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com
<<attachment: winmail.dat>>

