[Originally posted to comp.lang.xml, hoping for better luck here]

Hi everyone.

I'm doing a simple XSLT transformation in Java, using Xalan.  When I do
the transformation, though, my non-Latin characters get converted to
question marks (?).  Is there some option I have to set for it to
output these characters properly?

This is my test.xml:

  <?xml version="1.0" encoding="UTF-8"?>
  <test>abc&#x4e00;xyz</test>

This is my test.xsl (just an identity transformation):

  <xsl:stylesheet version="1.0"
                  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
    <xsl:template match="/">
      <xsl:copy-of select="."/>
    </xsl:template>
  </xsl:stylesheet>

This is my test.java (the code to do the transformation):

  import javax.xml.parsers.DocumentBuilder;
  import javax.xml.parsers.DocumentBuilderFactory;
  import javax.xml.transform.Transformer;
  import javax.xml.transform.TransformerFactory;
  import javax.xml.transform.dom.DOMResult;
  import javax.xml.transform.dom.DOMSource;
  import org.w3c.dom.Document;
  import org.apache.xml.serialize.OutputFormat;
  import org.apache.xml.serialize.XMLSerializer;
  import java.io.StringWriter;
  
  class test
  {
      public static void main(String args[])
          throws java.lang.Exception
      {
          // Get a DocumentBuilder
          DocumentBuilderFactory dFactory =
            DocumentBuilderFactory.newInstance();
          dFactory.setNamespaceAware(true);
          dFactory.setIgnoringElementContentWhitespace(false);
          DocumentBuilder dBuilder = dFactory.newDocumentBuilder();
  
          // Get the XSL
          Document xslDoc = dBuilder.parse("test.xsl");
          DOMSource xslDomSource = new DOMSource(xslDoc);
          xslDomSource.setSystemId("test.xsl");
  
          // Get the XML
          Document xmlDoc = dBuilder.parse("test.xml");
          DOMSource xmlDocSource = new DOMSource(xmlDoc);
          xmlDocSource.setSystemId("test.xml");
  
          // A Document for the output
          Document docResult = dBuilder.newDocument();
  
          // A Transformer to do the transformation
          TransformerFactory tFactory = TransformerFactory.newInstance();
          Transformer transformer = tFactory.newTransformer(xslDomSource);
          transformer.transform(xmlDocSource, new DOMResult(docResult));
  
          // Serialize the output XML
          StringWriter sw = new StringWriter();
          OutputFormat format = new OutputFormat(xmlDoc, "UTF-8", true);
          format.setIndent(2);
          XMLSerializer serializer = new XMLSerializer(sw, format);
          serializer.serialize(docResult);
  
          // Print out the serialized output XML
          System.out.print(sw.getBuffer().toString());
      }
  }

And when I run the program like this:

  $ java test > output.xml

my output.xml contains:

  <?xml version="1.0" encoding="UTF-8"?>
  <test>abc?xyz</test>

What's going on here?

Thanks,

Cameron

-- 
Cameron McCormack
  // [EMAIL PROTECTED]
  // http://www.csse.monash.edu.au/~clm/
  // icq 26955922

Reply via email to