DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=27583>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=27583 Xerces throws IOExcepitons that should be SAXExceptions for bad UTF-8 and similar Summary: Xerces throws IOExcepitons that should be SAXExceptions for bad UTF-8 and similar Product: Xerces2-J Version: 2.6.2 Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: Other Component: SAX AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] When Xerces (XMLReader.parse()) encounters malformed Unicode data such as an invalid UTF-8 sequence it throws an IOException, more specifically a UTFDataFormatException or a CharConversionException. However, according to the SAX and XML specificaitons this should be a SAXException which is reported to the ErrorHandler's fatalError() mehtod. Note first from the XML spec which states, in section 4.3.3: It is a fatal error when an XML processor encounters an entity with an encoding that it is unable to process. It is a fatal error if an XML entity is determined (via default, encoding declaration, or higher-level protocol) to be in a certain encoding but contains byte sequences that are not legal in that encoding. Specifically, it is a fatal error if an entity encoded in UTF-8 contains any irregular code unit sequences, as defined in Unicode 3.1 [Unicode3]. Unless an encoding is determined by a higher-level protocol, it is also a fatal error if an XML entity contains no encoding declaration and its content is not legal UTF-8 or UTF-16. The SAX spec says of the fatalError() method, "This corresponds to the definition of "fatal error" in section 1.2 of the W3C XML 1.0 Recommendation. For example, a parser would use this callback to report the violation of a well-formedness constraint." At one point I thought it was OK to report this as an IOException. However, since the XML spec is unambiguous that character encoding errors are fatal errors, and since the SAX spec does not limit fatal errors to well-formedness errors, I think character encoding errors should be reported as SAXExceptions rather than IOExceptions, and should be reported ot the fatalError method. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]