If you want the HTML serializer to write out <foo>&mdash;</foo>, put a 
genuine unicode mdash character into the text and let the serializer deal 
with converting that to the correct format -- just as the parser converts 
it the other way, yielding a text node or character event containing that 
unicode character.

Let the tool do what it was designed to do. Don't try to second-guess it.

(Of course the serializer may decide to output the character as a numeric 
character escape instead of the human-readable entity name. But that's OK; 
it's still a correct representation of your document, and any software 
which cares about the distinction between those two renderings is, to put 
it simply,  broken.)


______________________________________
"You build world of steel and stone
I build worlds of words alone
Skilled tradespeople, long years taught:
You shape matter; I shape thought."
(http://www.songworm.com/lyrics/songworm-parody/ShapesofShadow.html)



From:
Nathan Nadeau <n...@gleim.com>
To:
Aeris <ae...@imirhil.fr>
Cc:
xalan-j-users@xml.apache.org
Date:
10/12/2011 03:29 PM
Subject:
Re: Disable escaping on transformer



Nicolas,

It seems you are not using anything specific to Xalan in your code at 
http://pastebin.com/LfGpWMai, though I may be missing something.

This behavior, according to your code, is actually expected. You are 
creating a text node with the value "&mdash;" and wanting to output that 
in an XML file. In order to do this, the '&' must be escaped as "&amp;" 
in the output XML file. So the output is correct, though it is probably 
not what you want. When read in by other XML parsers, your created XML 
would contain an element called "div" with a text value of "&mdash;" 
(which is what you told it to have).

You can tell the class responsible for writing out the document to no 
longer escape special characters such as '&', though generally this is 
not preferred unless you have no other choice, at least according to 
best practices that I'm aware of. If you are reading in XML documents 
(instead of building DOM from scratch like in your example) you should 
also be able to tell the XML parser to not resolve entities in source 
document.

-----------

// this outputs <div>&mdash;</div> by telling StreamResult to disable 
output escaping via
// a processing instruction in the source DOM
final DocumentBuilder builder = 
DocumentBuilderFactory.newInstance().newDocumentBuilder();
final Document document = builder.newDocument();
final Node pi = 
document.createProcessingInstruction(StreamResult.PI_DISABLE_OUTPUT_ESCAPING,"");
final Node div = document.createElement("div");
document.appendChild(pi);
document.appendChild(div);
div.appendChild(document.createTextNode("&mdash;"));
final Transformer transformer = 
TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
final Writer out = new StringWriter();
StreamResult sr = new StreamResult(out);
transformer.transform(new DOMSource(document), sr);

-----------

To disable entity resolving when reading in the source XML document, see 
DocumentBuilderFactory.setExpandEntityReferences().

Entities and entity references can be quite tricky to work with, and you 
must understand what is happening at each level of the XML processing, 
from reading in the source XML, to running a transform on the XML, to 
outputting the final result.

Aeris wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
>
> I have a little problem with Xalan.
>
> I use Transformer to create a HTML file from a Document.
> But in generated HTML, all « & » in the document, which are parts of
> already escaped HTML entities like « &nbsp; », are re-escaped by Xalan.
>
> See this sample : http://pastebin.com/LfGpWMai
> Instead of expected
>                <div>&mdash;</div>
> I get
>                <div>&amp;mdash;</div>
>
> I search on doc and Google, but nothing found to disable escaping.
> How I can do this ?
>
> Thanks
> - -- 
> Nicolas VINOT
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iQEcBAEBAgAGBQJOkhk3AAoJEK8zQvxDY4P9iioIAL9v9bG/pbnhNA18iioMaLy6
> AwrQFRy7k3L1Y92jrUf54crvFUYWj9tNPH9W0tUA/SShvvDQI1h7hulX5ZL64ijL
> 2M70nwkvFhh06mDyNwkIXJfm01oBc3OSJRqID/NGgarThVzp2Wjwte6qqLKOQTJS
> REh8eVi8Ttu9DNnTR4VyH7GNbbyKDY0QjmNHZxD79LpLGEHRf9+ONxkn0SRvfAmJ
> dSAozRXxyb7Mx65+DtOGCmHlk0407gbo9B38nPSE2WBYwaLSf6i+N8dlBnWxdVDn
> xpuQnm0j3RRtuaTG/CRyWbEjO0es6EXK1dpg6oGyI0skiCglY1kX9OqGLiVYFZA=
> =VKkB
> -----END PGP SIGNATURE-----
>
> 

-- 
Nathan Nadeau
n...@gleim.com
Software Development
Gleim Publications, Inc.
http://www.gleim.com


Reply via email to