Artur Tomusiak schrieb:

I am trying to convert a String with XML content in it into the
org.w3c.dom.Document object to do some modifications and then to
convert it back to the String. However, even if I do not do any
modifications to the object, I am still getting back a different
String than what I have provided as an input. The problem is with
the numeric XML entities. For example, if my input String is:

<?xml version="1.0" encoding="UTF-8"?>
<xml>
   &#169;
   &#38;   </xml>

Hi Artur,

in fact, and to be pedantic, these are neither entities nor entity
references; they're numerical character references; they just happen
to use the same syntax as general entity references. (See XML spec
if interested.)

As keshlam said, these are 100 % identical as far as XML is concerned.

It's not clear to me whether you use XSLT at all or only the DOM.
I'm assuming you're using XSLT.

When transforming to a DOM target, the XSLT serialization instruction
like <xsl:output encoding="US-ASCII"/> is disregarded.

If all you want is a string, there is no point in transforming to the
DOM. In that case, simply specify <xsl:output encoding="US-ASCII"/> in
your stylesheet. That would force numerical character references for
non-ASCII characters.

But the characters in your example are ASCII characters, and I do not
know of a way to have them serialized as numerical character references
in XSLT 1.0. Use Perl or AWK or some other general text processing tool
to postprocess your output.

Michael Ludwig

Reply via email to