[Originally posted to comp.lang.xml, hoping for better luck here]
Hi everyone.
I'm doing a simple XSLT transformation in Java, using Xalan. When I do
the transformation, though, my non-Latin characters get converted to
question marks (?). Is there some option I have to set for it to
output these characters properly?
This is my test.xml:
<?xml version="1.0" encoding="UTF-8"?>
<test>abc一xyz</test>
This is my test.xsl (just an identity transformation):
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
This is my test.java (the code to do the transformation):
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMResult;
import javax.xml.transform.dom.DOMSource;
import org.w3c.dom.Document;
import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
import java.io.StringWriter;
class test
{
public static void main(String args[])
throws java.lang.Exception
{
// Get a DocumentBuilder
DocumentBuilderFactory dFactory =
DocumentBuilderFactory.newInstance();
dFactory.setNamespaceAware(true);
dFactory.setIgnoringElementContentWhitespace(false);
DocumentBuilder dBuilder = dFactory.newDocumentBuilder();
// Get the XSL
Document xslDoc = dBuilder.parse("test.xsl");
DOMSource xslDomSource = new DOMSource(xslDoc);
xslDomSource.setSystemId("test.xsl");
// Get the XML
Document xmlDoc = dBuilder.parse("test.xml");
DOMSource xmlDocSource = new DOMSource(xmlDoc);
xmlDocSource.setSystemId("test.xml");
// A Document for the output
Document docResult = dBuilder.newDocument();
// A Transformer to do the transformation
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer(xslDomSource);
transformer.transform(xmlDocSource, new DOMResult(docResult));
// Serialize the output XML
StringWriter sw = new StringWriter();
OutputFormat format = new OutputFormat(xmlDoc, "UTF-8", true);
format.setIndent(2);
XMLSerializer serializer = new XMLSerializer(sw, format);
serializer.serialize(docResult);
// Print out the serialized output XML
System.out.print(sw.getBuffer().toString());
}
}
And when I run the program like this:
$ java test > output.xml
my output.xml contains:
<?xml version="1.0" encoding="UTF-8"?>
<test>abc?xyz</test>
What's going on here?
Thanks,
Cameron
--
Cameron McCormack
// [EMAIL PROTECTED]
// http://www.csse.monash.edu.au/~clm/
// icq 26955922