RE: Parsing XML Containing Euro Sign

Christopher Ebert 18 Nov 2002 23:05:37 -0000

        Hi Mike,

        I can't tell exactly what the problem is, but I've dealt some with 
character encoding issues...


        I ran into an 'unsupported encoding' problem because the 
byte-to-character converter classes changed name. I built Xerces and printed 
out the exception being thrown (a debugger that breaks on exceptions would have 
been nice :) and found that the name for ISO-8859-1 changed from something like 
ISO8859_1 to ISO_8859_1 or something like that: the encoder I had had one name 
and xerces was looking for the other. I copied the other out of a recent Sun 
distribution and everything works fine now.

        If you can't get the automatic mechanism to work, you can always try 
opening your io streams with specific character encodings -- bear in mind that 
the encoding name might be a little different than the ISO name.

        You might check the character encoding the file really uses. I've 
gotten files that do not use the encoding they say they do. There was a thread 
on the list a while ago about files from Japan that use a Windows encoding 
'differently' :).


        Cheers,
                Chris


-----Original Message-----
From: Mike Lepine [mailto:[EMAIL PROTECTED]
Sent: Monday, November 18, 2002 8:38 AM
To: [EMAIL PROTECTED]
Subject: Parsing XML Containing Euro Sign


I searhed the Xerces FAQ, tried to review the mailing list archives but they
appeared to be offline, and was not able to find much information on my
question. So, if this has been asked/answered before, I apologize for
reposting.

I am using Xerces version 1.4.2 to parse an XML document containing a Euro
sign character. I create a FileInputStream

            // create input stream from XML file
            FileInputStream inputStream(new File(fileName));

            // parse XML
            parser.parse(new InputSource(inputStream));

When the XML sign containing the Euro sign is parsed, it is misread,
converting it to a different character (in this case a question mark). I
tried to change the document encoding to UTF-16 instead of UTF-8 but this
generated an exception stating that UTF-16 was not supported.

In order to write the XML file (containing the Euro sign), I have to make
sure the data is written out as characters instead of bytes because when the
Euro sign is converted to a byte, it looks like the high order byte is
discarded resulting in the wrong character being written out.

Finally, my question is whether I can use Xerces to parse an XML document
containing the Euro sign and if so, how do I do it?

I appreciate any help offered.

Thanks.

- Mike


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Parsing XML Containing Euro Sign

Reply via email to