> Yes, problems do arise if there is more than one combining character > between the base character and the VS and they are not in canonical > order. But this is a marginal case which can be avoided by ensuring that > canonical order is always used.
If data is always encoded in canonical order, then having a VS within the combining mark sequence wouldn't create any normalization problems, that's true. But you well know that people do not want their Hebrew data in canonical order. Even if they did, it couldn't be guaranteed. There's a problem not only in cases of the form B M1 M2 VS, but also in cases of the form B M1 VS M2. Of course, the issues are different. The first may normalize to B M2 M1 VS; the second perhaps *ought* to normalize to B M2 M1 VS, but that won't happen. The only way to accommodate VSs within combining mark sequences would be to define a set of VSs that pick up their canonical combining class from the immediately preceding character. But since VSs can only be used in explicitly-specified combinations, it might be less hassle to simply add specific variation modifiers for specific combining marks (said modifiers being combining characters with the same combining class); but if you get to that point, you start to wonder whether adding a new combining mark would meet the need just as well without architecting entirely new encoding mechanisms. > An alternative of course would be to define a special VS with the same > combining class as the character it applies to, so that the two will > always remain together. Thus there would potentially be the need for a > considerable set of VSs. But I don't think this is really necessary. I think that would be better than having general VSs used with combining marks. Peter Constable

