Hi Matt, I can reproduce the behaviour you experience, and the reason is this: When the parser is reading UTF-8 from some source, it reads it in chunks to maximize performance as much as possible. The routines that look through the markup performing tokenization, well-formedness checking etc. operate on this internal buffer--where everything's already in UTF-16. The error reporting routines work relative to the routines that are concerned with the XML markup, since those are where most problems arise and that's the natural specific domain of an XML parser. When the markup routines have finished the XML declaration, they'll ask for more text, which will cause the transcoding routines to go merrily along their way to fill the requisite buffer. When the transcoder finds something it can't stomach it complains, but the error reporting logic only knows where the parser left off looking for markup. So yes, this is a bug. But it wouldn't be all that easy to fix, especially for transcoders that we don't own. So I'm afraid the probability of this being addressed in the near future isn't high. You might want to file a bugzilla report to keep this on the radar scope, in case anyone ever has the cycles to give it a serious run. Cheers, Neil Neil Graham XML Parser Development IBM Toronto Lab Phone: 905-413-3519, T/L 969-3519 E-mail: [EMAIL PROTECTED] "Matt Nemenman" <[EMAIL PROTECTED] To: [EMAIL PROTECTED] m> cc: Subject: Possible bug: invalid byte 1 (...) of a 1-byte sequence. 09/23/2003 08:23 PM Please respond to xerces-c-dev Hi, While trying to parse the file below (also in attachment), i got an error at line 1, position 40: "An exception occurred! Type:UTFDataFormatException, Message:invalid byte 1 () of a 1-byte sequence." <?xml version="1.0" encoding="utf-8" ?> <tag> Temperature 90F </tag> The file indeed contains an invalid UTF-8 character (a Latin1 character), however this character is at line 3, position 15 (completely not where it is reported). I have seen this problem quite often: invalid character error is often reported at the very end of xml declaration (line 1), even if invalid character is thousand lines down the file. Am I missing something, or is it a bug? Thanks a lot, -- Matt --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] #### f has been removed from this note on September 23 2003 by Neil Graham --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]