This was an issue when we were converting SGML documents to XML. The solution for us was to expand them into Unicode characters in the XML DTD.
-----Original Message----- From: Joseph Kesselman/CAM/Lotus [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 20, 2002 10:42 AM To: Steve Carton Cc: [EMAIL PROTECTED] Subject: Re: Character Entities The DOM has no concept of "character entities" per se. Named references to characters (such as <) are treated as predefined Parsed Entity References, just as if you had defined them yourself in the DTD. However, the DOM spec allows parsed entities to be "fully expanded", and leaves the question of which (if any) are treated that way up to the parser; most parsers I've seen _do_ fully expand these predefined entities but that's optional. Numeric character references (such as  ) are always expanded into their corresponding Unicode characters. And the DOM's requirement of text normalization means that if expansion was done, the resulting character will be merged with any adjacent text node(s). Question: Why would you _want_ to stop their expansion? These are used where it would otherwise be impossible to insert the character directly, and are intended to be read as the character when processing the document's contents. When you serialize the DOM back into XML format, it's the serializer's responsibility to re-convert them back into their escaped form.
