On Sun, 10 Feb 2013 16:18:23 -0800
David Starner <[email protected]> wrote:

> On Sun, Feb 10, 2013 at 3:46 PM, Costello, Roger L.
> <[email protected]> wrote:
> > Hi Folks,
> >
> > Can the combining diacritical marks combine with any base character?
> 
> Yes.
> 
> > If yes, wouldn't normalizing this:
> >
> >         <comment>(U+0303)
> >
> > to NFC result in converting the XML start tag into non-well-formed
> > XML? (It is not well-formed because there is no longer a '>'
> > character after the tag name; rather, there is a '>' character with
> > a tilde on top.)
> 
> Normalizing it to NFC would change nothing, since there's no
> precomposed '>' + diacritic characters.

The problem sequence is <U+003E GREATER-THAN SIGN, U+0338 COMBINING LONG
SOLIDUS OVERLAY> which is canonically equivalent to <U+226F NOT
GREATER-THAN>.  The short answer is that XML shall not do canonical
equivalence, at least, not on data; so doing would corrupt some of the
CLDR definitions, e.g. exemplar characters (TR 35 Section 5.6).  The XML
specification addresses the solution for avoiding inadvertent ≯.

Richard.



Reply via email to