David Hopwood <david dot hopwood at zetnet dot co dot uk> wrote: > OTOH, there can be more than one way to represent composites that > include two or more diacritics in different combining classes (e.g. > <e with circumflex and dot below>). Technically, that would mean that > strict byte-for-byte round-tripping of X -> NFD -> X would not be > guaranteed in every case (unless X also requires that all data is > normalised). This doesn't apply to T.61, but it does apply to other > standards such as TIS620 (ISO-Latin-11 / Thai), which have combining > marks in more than one class.
As you mentioned, this does not apply to T.61 or ISO 6937, because they do not permit multiple diacritics to be applied to a single base character. > Users have basically ignored (if they are even aware of) any > admonitions from standards institutions to treat U+005E, U+0060 or > U+007E as spacing accents, and continued to use them for the purposes > listed below: Programming languages, notably C and its offspring, have appropriated these characters for their own purposes. You can't really blame "users" for that. > So, there would have been no practical problem with disunifying > spacing circumflex, grave, and tilde from the above US-ASCII > characters, so that the preferred representation of all spacing > diacritics would have been the combining diacritic applied to U+0020. Except, of course, for any additional user confusion that might have arisen from encoding three more lookalike "spoof buddies." Unicode is already taking a lot of heat on the IDN list for not unifying all "lookalike" pairs. -Doug Ewell Fullerton, California

