I'm finding that xerces is always escaping ampersands, even when they are a
part of a character reference. For example, if I want to define a text
element like so: <someText>€</someText>, (where "€" is the
hexadecimal entity reference for the euro "EUR" sign) when xerces writes
this out to a file, I invariably get: "<someText>&#x20AC;</someText>"
Xerces is always escaping ampersands into the entity ref "&"
Perhaps my confusion arises out of poor understanding of xml, but I should
think that xerces would only escape ampersands that aren't a part of a valid
entity reference, i.e., if an ampersand is immediately followed by a pound
(#) sign, it should leave it alone. Is there a more reliable way to
reference extended ascii characters in xml, so that they will pass through
xerces unmolested?
I use castor and dom4j to manipulate my xml in my application, but these
both use Xerces under the covers if I am not mistaken. Some simple test
cases are below. Any guidance is very much appreciated.
Cheers,
Erskine
/***********************
* Castor example
*
************************/
import java.io.FileWriter;
import java.io.File;
import org.exolab.castor.xml.Marshaller;
public class CastorTest {
public static void main(String [] args) {
//populate an arbitrary data object with special characters
Factsheet fs = new Factsheet();
ContentSections cs = new ContentSections();
Content c = new Content();
c.addPara("£ © ®");
cs.addContent(c);
fs.setContentSections(cs);
//now use the castor marshalling framework to write the data object out
to xml
try {
FileWriter fw = new FileWriter(new File("tmp.xml"));
Marshaller m = new Marshaller(fw);
m.setEncoding("iso-8859-1");
m.marshal(fs);
} catch (Exception e) {
e.printStackTrace();
}
}
}
The resulting xml file looks like:
<?xml version="1.0" encoding="iso-8859-1"?>
<factsheet>
<content>
<para>&#xA3; &#xA9; &#xAE;</para>
</content>
</factsheet>
/********************************
*
* Dom4J example
*
********************************/
import org.dom4j.Document;
import org.dom4j.DocumentHelper;
import org.dom4j.Element;
import java.io.FileWriter;
import java.io.IOException;
import java.io.Writer;
public class JDomTest {
public static void main(String [] args) {
Document document = DocumentHelper.createDocument();
Element root = document.addElement("root");
Element test = root.addElement("test").addText("£,®");
try {
Writer w = new FileWriter("tmp.xml");
document.write(w);
w.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
The result document is:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<test>&#xA3;,&#xAE;</test>
</root>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]