Peter Kirk scripsit: > I have heard it mentioned in general terms that W3C has specified that > text should be normalised according to NFC. What actually is the scope > of this specification? Does it apply to all XML, HTML etc? Is it > mandatory or just a recommendation?
It is not mandatory. It is a SHOULD, which is between MUST (mandatory) and MAY (permissive); it means that "there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course." XML 1.0 is silent on the subject. XML 1.1 (not yet finalized) says that XML parsers SHOULD (in the sense above) verify that their input is normalized, and explains exactly what "normalized" means in connection with various XML constructs; for example, the character just after a start-tag SHOULD not be a combining character. > I would also like to know if this is actually applied or enforced by > products such as OpenOffice and Microsoft Office 2003 which use XML as > one of their native document formats. Will text saved in these formats > be normalised to NFC? Should it be? Output SHOULD be normalized; input SHOULD be verified as normalized, but not forcibly normalized (doing so is a security hole). Whether any particular product does this is up to the people who make the product, and I have no information on either of those. -- One art / There is John Cowan <[EMAIL PROTECTED]> No less / No more http://www.reutershealth.com All things / To do http://www.ccil.org/~cowan With sparks / Galore -- Douglas Hofstadter

