Thanks for your info. How about the second question, if I escaped the international character (0xD2) with Ò, isn't Xerces should report the same error? but it doesn't.
<FreeFormText>POSTBOKS 60 SKÒYEN</FreeFormText> <FreeFormText>POSTBOKS 60 SKÒYEN</FreeFormText> Thanks, Benson. -----Original Message----- From: Andy Clark [mailto:[EMAIL PROTECTED] Sent: Monday, December 02, 2002 11:49 PM To: [EMAIL PROTECTED] Subject: Re: UTF-8 encoding question Benson Cheng wrote: > Thanks for the info, the xerces 2.2.1 did report error > (java.io.UTFDataFormatException: Invalid byte 2 of 2-byte UTF-8 sequence) on > the following line. > > <FreeFormText>POSTBOKS 60 SKÒYEN</FreeFormText> You get this error when you use a character in your document but incorrectly specify the file encoding. The first line of the XML document (called the XMLDecl) specifies the encoding of the file. For example: <?xml version='1.0' encoding='ISO-8869-1'?> If this line is missing, then the default encoding is UTF-8. However, if you've created your document with a text editor like Notepad, it will save the file with the default encoding of the system -- usually Cp1252 (aka Windows-1252). However, be aware that simply adding an XMLDecl line to your file does *not* change the encoding. To do that, the program that creates the file MUST save the contents in the appropriate encoding. In Notepad under Win2K or XP, there is an encoding selection on the Save dialog that allows you to select various Unicode encodings like "UTF-8". Hope this helps... -- Andy Clark * [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]