In UTF-8, characters over 0x7F are encoded as multi-byte sequences. Your 0xD2 character (binary 11010010) should be encoded as the two bytes 11000011 10010010, or 0xC3 0x92.
See http://www.faqs.org/rfcs/rfc2279.html for the exact details. As to why an ancient version of Xerces accepted it: It was a bug. Try a modern release of Xerces and see if still accepts that byte; I'd bet it won't. ______________________________________ Joe Kesselman / IBM Research --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]