If you want the HTML serializer to write out <foo>—</foo>, put a genuine unicode mdash character into the text and let the serializer deal with converting that to the correct format -- just as the parser converts it the other way, yielding a text node or character event containing that unicode character.
Let the tool do what it was designed to do. Don't try to second-guess it. (Of course the serializer may decide to output the character as a numeric character escape instead of the human-readable entity name. But that's OK; it's still a correct representation of your document, and any software which cares about the distinction between those two renderings is, to put it simply, broken.) ______________________________________ "You build world of steel and stone I build worlds of words alone Skilled tradespeople, long years taught: You shape matter; I shape thought." (http://www.songworm.com/lyrics/songworm-parody/ShapesofShadow.html) From: Nathan Nadeau <n...@gleim.com> To: Aeris <ae...@imirhil.fr> Cc: xalan-j-users@xml.apache.org Date: 10/12/2011 03:29 PM Subject: Re: Disable escaping on transformer Nicolas, It seems you are not using anything specific to Xalan in your code at http://pastebin.com/LfGpWMai, though I may be missing something. This behavior, according to your code, is actually expected. You are creating a text node with the value "—" and wanting to output that in an XML file. In order to do this, the '&' must be escaped as "&" in the output XML file. So the output is correct, though it is probably not what you want. When read in by other XML parsers, your created XML would contain an element called "div" with a text value of "—" (which is what you told it to have). You can tell the class responsible for writing out the document to no longer escape special characters such as '&', though generally this is not preferred unless you have no other choice, at least according to best practices that I'm aware of. If you are reading in XML documents (instead of building DOM from scratch like in your example) you should also be able to tell the XML parser to not resolve entities in source document. ----------- // this outputs <div>—</div> by telling StreamResult to disable output escaping via // a processing instruction in the source DOM final DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder(); final Document document = builder.newDocument(); final Node pi = document.createProcessingInstruction(StreamResult.PI_DISABLE_OUTPUT_ESCAPING,""); final Node div = document.createElement("div"); document.appendChild(pi); document.appendChild(div); div.appendChild(document.createTextNode("—")); final Transformer transformer = TransformerFactory.newInstance().newTransformer(); transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); final Writer out = new StringWriter(); StreamResult sr = new StreamResult(out); transformer.transform(new DOMSource(document), sr); ----------- To disable entity resolving when reading in the source XML document, see DocumentBuilderFactory.setExpandEntityReferences(). Entities and entity references can be quite tricky to work with, and you must understand what is happening at each level of the XML processing, from reading in the source XML, to running a transform on the XML, to outputting the final result. Aeris wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi, > > I have a little problem with Xalan. > > I use Transformer to create a HTML file from a Document. > But in generated HTML, all « & » in the document, which are parts of > already escaped HTML entities like « », are re-escaped by Xalan. > > See this sample : http://pastebin.com/LfGpWMai > Instead of expected > <div>—</div> > I get > <div>&mdash;</div> > > I search on doc and Google, but nothing found to disable escaping. > How I can do this ? > > Thanks > - -- > Nicolas VINOT > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iQEcBAEBAgAGBQJOkhk3AAoJEK8zQvxDY4P9iioIAL9v9bG/pbnhNA18iioMaLy6 > AwrQFRy7k3L1Y92jrUf54crvFUYWj9tNPH9W0tUA/SShvvDQI1h7hulX5ZL64ijL > 2M70nwkvFhh06mDyNwkIXJfm01oBc3OSJRqID/NGgarThVzp2Wjwte6qqLKOQTJS > REh8eVi8Ttu9DNnTR4VyH7GNbbyKDY0QjmNHZxD79LpLGEHRf9+ONxkn0SRvfAmJ > dSAozRXxyb7Mx65+DtOGCmHlk0407gbo9B38nPSE2WBYwaLSf6i+N8dlBnWxdVDn > xpuQnm0j3RRtuaTG/CRyWbEjO0es6EXK1dpg6oGyI0skiCglY1kX9OqGLiVYFZA= > =VKkB > -----END PGP SIGNATURE----- > > -- Nathan Nadeau n...@gleim.com Software Development Gleim Publications, Inc. http://www.gleim.com