From: Asmus Freytag <[EMAIL PROTECTED]> > > At 05:33 PM 4/24/2004, Ernest Cline wrote: > >There are problems. Suppose, we define a new variation selector that > >will stay with the preceding mark under normalization. > > > >Now consider what happens when implementations conforming to > >a standard of Unicode that does not know about the new character > >normalizes the sequence BC CM180 CM160 NVS > > BC = Base Character > > CM# = Combining Mark of ccc # > > NVS = New Variation Selector. > > > >As far as it knows, the new variation selector is an undefined character > >with a ccc of 0, so when normalizing this it will reorder it as: > >BC CM160 CM180 NVS > >Now lets have this "normalized" string be passed on to an > >implementation which knows about this NVS, There were two > >schemes I proposed for implementing this NVS. Both have problems, > >as I will point out below. > > No implementation supporting version X can normalize all data > containing characters from a later release. If that was a requirement > we could never add combining characters. What is required is that > all later implementation normalize any data from version X > the same way a version X implementation would have done and to > not change already normalized data. I think it's strictly speaking > the latter aspect that's guaranteed.
The best that a version can do with an unknown character is treat it as a non-decomposable character with a ccc of 0. With the way that normalization works, at worst it preserves it so that an implementation that does know the character can correctly normalize it. My point here was that adding a category of characters that was tightly bound to the preceding character without using the existing combining class mechanism would cause problems for normalization that could not be avoided, and as such, it is impossible to add variation selectors for combining marks unless the variation selector for a combining mark is of the same canonical combining class. That would cause any proposal for such variation selectors to have to add variation selectors for each canonical combining class, and thus increase the cost of implementing such a proposal. It might make sense to relax the restriction on allowable variation sequences to include combining marks of class 0, and maybe even to provide variation selectors for the two big classes of combing characters, 220 and 230, given that those two classes are far and away the largest non-0 classes at present and are likely to remain so.

