2016-11-08 9:30 GMT+01:00 Richard Wordingham < [email protected]>:
> TUS Section 2.11 says, "If the combining characters can interact > typographically—for example, U+0304 combining macron and U+0308 > combining diaeresis — then the order of graphic display is > determined by the order of coded characters (see Table 2-5). > By default, the diacritics or other combining characters are > positioned from the base character’s glyph outward". > The interpretation of "If the combining characters can interact typographically" should be better read as "If the combining characters have the same non-zero combining class or any one of them has a zero combining class". Effectively the combining classes were historically intended to track these possible graphic interactions, in order to allow or disable reordering and detect canonical equivalences. But now normalization is everywhere and causes the pairs using the condition above to be freely reordered (or decomposed and recomposed, meaning that the encoding order is NOT significant at all). But it turned out that some diacritics may be positioned differently according to their base character. E.g., the cedilla which may interact below, where no interaction is supposed with other combining characters normally interacting above (so that reordering to canonical equivalents is permitted and in fact made automatically during the encoding/decoding processes of documents), but with some Latin letters these interaction do occur. The only way then to block the reordering (if you don't want the positions infered from the encoding order of normalized strings), is to block it using zero-combining joiners (CGJ). This sentence should have been updated since long in TUS, because TUS does not really know how characters will be positioned and Unicode permits reordering of pairs of diacritics if they are not blocking each other for normalization. This is important for the cedilla, but even more important for Hebrew diacritics, whose combining classes do not really track correctly their relative positioning (as discussed on this list years ago, and known as the "Hebrew points bug" (but this will never change: the combiing classes are assigned permanently and continue to work for simple cases, but will cause problems with some pairs needing insertions of CGJ). This is also important for several Indic scripts that have complex positioning rules if you use combining characters with non-zero combining classes (initially intended for simple cases in Latin/Greek/Cyrillic). Thanks, the most critical diacritics in Indic scripts for such complex cases have a combining class set to zero (meaning that they blcok eah other and their relative encoding order is not affected by normalization, but there are many cases where CGJ is needed.

