Thanks for the reply. This does help.
----- Original Message ----- From: "Christopher Ebert" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, November 18, 2002 6:05 PM Subject: RE: Parsing XML Containing Euro Sign > > Hi Mike, > > I can't tell exactly what the problem is, but I've dealt some with character encoding issues... > > I ran into an 'unsupported encoding' problem because the byte-to-character converter classes changed name. I built Xerces and printed out the exception being thrown (a debugger that breaks on exceptions would have been nice :) and found that the name for ISO-8859-1 changed from something like ISO8859_1 to ISO_8859_1 or something like that: the encoder I had had one name and xerces was looking for the other. I copied the other out of a recent Sun distribution and everything works fine now. > > If you can't get the automatic mechanism to work, you can always try opening your io streams with specific character encodings -- bear in mind that the encoding name might be a little different than the ISO name. > > You might check the character encoding the file really uses. I've gotten files that do not use the encoding they say they do. There was a thread on the list a while ago about files from Japan that use a Windows encoding 'differently' :). > > > Cheers, > Chris > > > -----Original Message----- > From: Mike Lepine [mailto:[EMAIL PROTECTED] > Sent: Monday, November 18, 2002 8:38 AM > To: [EMAIL PROTECTED] > Subject: Parsing XML Containing Euro Sign > > > I searhed the Xerces FAQ, tried to review the mailing list archives but they > appeared to be offline, and was not able to find much information on my > question. So, if this has been asked/answered before, I apologize for > reposting. > > I am using Xerces version 1.4.2 to parse an XML document containing a Euro > sign character. I create a FileInputStream > > // create input stream from XML file > FileInputStream inputStream(new File(fileName)); > > // parse XML > parser.parse(new InputSource(inputStream)); > > When the XML sign containing the Euro sign is parsed, it is misread, > converting it to a different character (in this case a question mark). I > tried to change the document encoding to UTF-16 instead of UTF-8 but this > generated an exception stating that UTF-16 was not supported. > > In order to write the XML file (containing the Euro sign), I have to make > sure the data is written out as characters instead of bytes because when the > Euro sign is converted to a byte, it looks like the high order byte is > discarded resulting in the wrong character being written out. > > Finally, my question is whether I can use Xerces to parse an XML document > containing the Euro sign and if so, how do I do it? > > I appreciate any help offered. > > Thanks. > > - Mike > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
