Hello Constantine,

It looks like your problem is with FileReader. It assumes the default
character encoding for your system, which may be UTF-8, EBCDIC, or
something else. When you pass a Reader to the parser, any available
encoding information isn't used because the parser doesn't read from the
underlying byte stream. It only sees the transcoded characters.

Unless you have a good reason against it, you should let the parser detect
the encoding itself. For instance you could create a FileInputStream
instead, and set this on your InputSource.

Hope that helps.

On Wed, 25 Jun 2003, Hondros, Constantine wrote:

> I'm parsing a UTF-16 Japanese XML file with Xerces 2.4 with a simple class
> that extends DefaultHandler. I am just trying to write out certain CDATA
> attribute values (these are the Japanese characters)  into a file : very
> simple, supposedly.
>
> Problem is, there is some sort of encoding mischief going on , as the UTF-16
> Japanese characters in the CDATA attributes are coming out horribly mangled.
>
> This is how I am initiating the parse :
>
>       XMLReader parser =
> XMLReaderFactory.createXMLReader(DEFAULT_PARSER_NAME);
>       parser.setFeature(VALIDATION_FEATURE_ID, false);
>       parser.setContentHandler(this);
>       parser.setErrorHandler(this);
>       parser.setEntityResolver(new DTDResolver());
>       FileReader reader = new FileReader(tocFile);
>             InputSource source = new InputSource(reader);
>             source.setEncoding("UTF-16");
>             source.setSystemId(tocFile.getAbsolutePath());
>       parser.parse(source);
>
> and this (simplified) is how I am grabbing the Japanese characters (I am
> appending them to a StringBuffer) :
>
>       public void startElement(String uri, String local, String qname,
> Attributes attrs) throws SAXException {
>                   myStringBuffer.append(attrs.getValue("myattribute"));
>
> So two questions : should I be using a FileReader when I initiate the parse
> or some other object of the IO family?
> And : is it naive to expect the characters to pop off the attrs parameter
> without having to do some extra work?
>
> Any hints greatly appreciated,
>
> Constantine Hondros
>
>
>
>
> --
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or disclose
> it to anyone else. If you received it in error please notify us immediately
> and then destroy it.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

--------------------
Michael Glavassevich
[EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to