The same would be true if you wrote a combining character after an apostrophe-quote or a double quote within a XML or HTML attribute. NFC will not combine it with these syntaxic delimiters. However this would not be true if you wrote some combining characters after an equal sign (for HTML and XML the solution is to write these combining characters (that are part of an attribute value) between quotation marks (mandatory in XML).
The problem however is within text editors, and for such things, it is probably better to encode a leading combining character as a numeric character entity (this is only needed when editing the XML/HTML file manually, as an HTML or XML generator, meant to be parsed by a machine and not reedited by a human, may safely ignore this. However this has a consequence : you cannot blindly normalize an HTML/XML source file as if it was a "flat" plain-text, normalization of these files should be performed on the parsed DOM, on individual text elements or individual attribute values, or individual element or attribute names, or on individual comments (and it should be avoided on parsing instructions). Similar considerations would apply to source files in other progrmming languages (such as Javascript, PHP or C++ source files containing literal strings), which should not be normalized without knowledge of the syntax of these languages. Syntaxic problems created by normalization may be more serious in some file formats such as data files where spaces are used as filed separators : they are also not really flat plain-text files (normalization is only safe within each individual field, i.e. on the parsed elements of the document). In addition there may be data validity constraints in those languages, even if for the Unicode standard itself, these texts are still valid : these extra constraints are out of scope of the standard itself. However to help defining some validity rules for programming languages, the Unicode standard suggests rules that allow programming languages to define identifiers which can be "safely" internationalized, by adding sufficient constraints where normalization should not a be problem for parsing thoese languages, but Unicode does not define how these identifiers will collate and match (some programming languages may ignore some case differences, however most of them will not treat identifiers that are canonically equivalent for Unicode as being equivalent for these language parsers, so these language will still define their own rules for valid and unique identifiers). 2013/2/11 David Starner <[email protected]>: > On Sun, Feb 10, 2013 at 3:46 PM, Costello, Roger L. <[email protected]> > wrote: >> Hi Folks, >> >> Can the combining diacritical marks combine with any base character? > > Yes. > >> If yes, wouldn't normalizing this: >> >> <comment>(U+0303) >> >> to NFC result in converting the XML start tag into non-well-formed XML? (It >> is not well-formed because there is no longer a '>' character after the tag >> name; rather, there is a '>' character with a tilde on top.) > > Normalizing it to NFC would change nothing, since there's no > precomposed '>' + diacritic characters.

