Hi Ha, There's no doubt that the attached file is not well-formed. The 0xb7 characters are definitely not properly encoded in UTF-8 (they'd need to be encoded as 0xc2 0xb7 in order for the encoding to be proper (if I've done the conversion correctly). If no encoding declaration is specified, an XML parser is required to treat a document as UTF-8 (unless it can determine that it's actually UTF-16).
Note that all is well if you specify the document's encoding to be encoding="ISO-8859-1" which is, I suspect, the actual encoding. I was not able to reproduce the behaviour you describe when the document is declared to be UTF-8: the parser still produced an error for me in this case. If you continue to observe this, please attach a test case declared to be UTF-8 that works. Cheers, Neil Neil Graham XML Parser Development IBM Toronto Lab Phone: 905-413-3519, T/L 969-3519 E-mail: [EMAIL PROTECTED] |---------+----------------------------> | | "Huynh, Ha" | | | <[EMAIL PROTECTED]| | | com> | | | | | | 08/07/2003 06:12 | | | PM | | | Please respond to| | | xerces-c-dev | | | | |---------+----------------------------> >---------------------------------------------------------------------------------------------------------------------------------------------| | | | To: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]> | | cc: | | Subject: UTFDataFormatException (bitwise AND error in XMLUTF8Transcoder) | | | | | >---------------------------------------------------------------------------------------------------------------------------------------------| I am getting a UTFDataFormatException when using the following xml doc (attached). It appears to be complaining about the "bullet" character. Note the xml doc contains hidden character (LATIN A with circumflex) right before the bullet. If I add the encoding="UTF-8" there is no UTFDataFormatException. However, without specifying any encoding I get the following error. When I trace through the code it looks like the default encoding for xerces 2.3 is to use UTF-8. The UTFDataFormatException is thrown in XMLUTF8Transcoder.cpp ln 222. if((gUTFByteIndicatorTest[trailingBytes] & *srcPtr) != gUTFByteIndicator[trailingBytes]) { throw error here} I checked the values and gUTFByteIndicatorTest[trailingBytes] = 0 *srcPtr = 183 gUTFByteIndicator[trailingBytes] = 0 So we should not go into this loop. However the computation of the line: gUTFByteIndicatorTest[trailingBytes] & *srcPtr = 128 //This should be 0. Another observation I made was that if I were to use the xml doc without specifying an encoding AND move the bullet character and hidden character value to another element of the xml, this exception does not occur. Not sure what's going on. Fatal Error at file C:\temp\SAXSchemaParser\Debug/personal.xml, line 1, char 22 Message: An exception occurred! Type:UTFDataFormatException, Message:invalid byte 1 (╖) of a 1-byte sequence. I am running xerces 2.3 compiled with MSVS 7.0. Any ideas? --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] #### personal.xml has been removed from this note on August 07 2003 by Neil Graham