On Sun, 10 Feb 2013 16:18:23 -0800 David Starner <[email protected]> wrote:
> On Sun, Feb 10, 2013 at 3:46 PM, Costello, Roger L. > <[email protected]> wrote: > > Hi Folks, > > > > Can the combining diacritical marks combine with any base character? > > Yes. > > > If yes, wouldn't normalizing this: > > > > <comment>(U+0303) > > > > to NFC result in converting the XML start tag into non-well-formed > > XML? (It is not well-formed because there is no longer a '>' > > character after the tag name; rather, there is a '>' character with > > a tilde on top.) > > Normalizing it to NFC would change nothing, since there's no > precomposed '>' + diacritic characters. The problem sequence is <U+003E GREATER-THAN SIGN, U+0338 COMBINING LONG SOLIDUS OVERLAY> which is canonically equivalent to <U+226F NOT GREATER-THAN>. The short answer is that XML shall not do canonical equivalence, at least, not on data; so doing would corrupt some of the CLDR definitions, e.g. exemplar characters (TR 35 Section 5.6). The XML specification addresses the solution for avoiding inadvertent ≯. Richard.

