Artur Tomusiak schrieb:
I am trying to convert a String with XML content in it into the org.w3c.dom.Document object to do some modifications and then to convert it back to the String. However, even if I do not do any modifications to the object, I am still getting back a different String than what I have provided as an input. The problem is with the numeric XML entities. For example, if my input String is: <?xml version="1.0" encoding="UTF-8"?> <xml> © & </xml>
Hi Artur, in fact, and to be pedantic, these are neither entities nor entity references; they're numerical character references; they just happen to use the same syntax as general entity references. (See XML spec if interested.) As keshlam said, these are 100 % identical as far as XML is concerned. It's not clear to me whether you use XSLT at all or only the DOM. I'm assuming you're using XSLT. When transforming to a DOM target, the XSLT serialization instruction like <xsl:output encoding="US-ASCII"/> is disregarded. If all you want is a string, there is no point in transforming to the DOM. In that case, simply specify <xsl:output encoding="US-ASCII"/> in your stylesheet. That would force numerical character references for non-ASCII characters. But the characters in your example are ASCII characters, and I do not know of a way to have them serialized as numerical character references in XSLT 1.0. Use Perl or AWK or some other general text processing tool to postprocess your output. Michael Ludwig