Hello Constantine,
It looks like your problem is with FileReader. It assumes the default
character encoding for your system, which may be UTF-8, EBCDIC, or
something else. When you pass a Reader to the parser, any available
encoding information isn't used because the parser doesn't read from the
underlying byte stream. It only sees the transcoded characters.
Unless you have a good reason against it, you should let the parser detect
the encoding itself. For instance you could create a FileInputStream
instead, and set this on your InputSource.
Hope that helps.
On Wed, 25 Jun 2003, Hondros, Constantine wrote:
> I'm parsing a UTF-16 Japanese XML file with Xerces 2.4 with a simple class
> that extends DefaultHandler. I am just trying to write out certain CDATA
> attribute values (these are the Japanese characters) into a file : very
> simple, supposedly.
>
> Problem is, there is some sort of encoding mischief going on , as the UTF-16
> Japanese characters in the CDATA attributes are coming out horribly mangled.
>
> This is how I am initiating the parse :
>
> XMLReader parser =
> XMLReaderFactory.createXMLReader(DEFAULT_PARSER_NAME);
> parser.setFeature(VALIDATION_FEATURE_ID, false);
> parser.setContentHandler(this);
> parser.setErrorHandler(this);
> parser.setEntityResolver(new DTDResolver());
> FileReader reader = new FileReader(tocFile);
> InputSource source = new InputSource(reader);
> source.setEncoding("UTF-16");
> source.setSystemId(tocFile.getAbsolutePath());
> parser.parse(source);
>
> and this (simplified) is how I am grabbing the Japanese characters (I am
> appending them to a StringBuffer) :
>
> public void startElement(String uri, String local, String qname,
> Attributes attrs) throws SAXException {
> myStringBuffer.append(attrs.getValue("myattribute"));
>
> So two questions : should I be using a FileReader when I initiate the parse
> or some other object of the IO family?
> And : is it naive to expect the characters to pop off the attrs parameter
> without having to do some extra work?
>
> Any hints greatly appreciated,
>
> Constantine Hondros
>
>
>
>
> --
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or disclose
> it to anyone else. If you received it in error please notify us immediately
> and then destroy it.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
--------------------
Michael Glavassevich
[EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]