I'm parsing a UTF-16 Japanese XML file with Xerces 2.4 with a simple class
that extends DefaultHandler. I am just trying to write out certain CDATA
attribute values (these are the Japanese characters)  into a file : very
simple, supposedly.

Problem is, there is some sort of encoding mischief going on , as the UTF-16
Japanese characters in the CDATA attributes are coming out horribly mangled.

This is how I am initiating the parse :

        XMLReader parser =
XMLReaderFactory.createXMLReader(DEFAULT_PARSER_NAME);
        parser.setFeature(VALIDATION_FEATURE_ID, false);
        parser.setContentHandler(this);
        parser.setErrorHandler(this);
        parser.setEntityResolver(new DTDResolver());
        FileReader reader = new FileReader(tocFile);
            InputSource source = new InputSource(reader);
            source.setEncoding("UTF-16");
            source.setSystemId(tocFile.getAbsolutePath());
        parser.parse(source);

and this (simplified) is how I am grabbing the Japanese characters (I am
appending them to a StringBuffer) :

        public void startElement(String uri, String local, String qname,
Attributes attrs) throws SAXException {
                    myStringBuffer.append(attrs.getValue("myattribute"));

So two questions : should I be using a FileReader when I initiate the parse
or some other object of the IO family?
And : is it naive to expect the characters to pop off the attrs parameter
without having to do some extra work?

Any hints greatly appreciated,

Constantine Hondros
  



-- 
The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it. 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to