Philippe Verdy scripsit: > When in doubt, don't perform any normalization of XML _files_ as they are > NOT plain text: you need a XML parser to do it safely only in relevant > sections of this file. All you could do safely is to possibly reencode XML > files (for example from UTF-8 to UTF-16 encoding schemes).
This is wildly overstated. XML files most certainly are plain text, though they may be interpreted as fancy text in contexts that understand XML. With the insignificant exception of a markup ">" immediately followed by a U+0338 character, it is entirely safe to normalize XML files according to any normalization. (It is true that NK* normalization forms may lose information, but XML document authors are discouraged from using compatibility decomposables in any case.) What is not allowed, and this makes XML technically non-conformant to the Unicode Standard, is to make arbitrary and unsystematic replacements of one canonically equivalent form with another. For example, if an element name is "h)B�t�rog�n�it�" (a favorite word of mine), decomposing the start-tag while leaving the end-tag composed would make the document no longer well-formed XML. In my opinion, this is a corner case that may be safely ignored. -- John Cowan www.reutershealth.com www.ccil.org/~cowan [EMAIL PROTECTED] 'Tis the Linux rebellion / Let coders take their place, The Linux-nationale / Shall Microsoft outpace, We can write better programs / Our CPUs won't stall, So raise the penguin banner of / The Linux-nationale.

