DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=24579>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=24579 [XML 1.0] - E27: Must reject non-shortest forms in UTF-8 Summary: [XML 1.0] - E27: Must reject non-shortest forms in UTF-8 Product: Xerces2-J Version: 2.5.0 Platform: All URL: http://www.w3.org/XML/xml-V10-2e-errata#E27 OS/Version: All Status: NEW Severity: Normal Priority: Other Component: Other AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] E27 [1] states that "it is a fatal error if an entity encoded in UTF-8 contains any irregular code unit sequences, as defined in Unicode 3.1". I had a look at this errata sometime ago, and in addition to irregular code unit sequences being a fatal error, we should also reject non-shortest forms. These non- shortest forms (such as C0 80 or E0 80 80 which both correspond to codepoint 0), are not legal in Unicode 3.1. See "UTF-8 Corrigendum" and "Table 3.1B. Legal UTF-8 Byte Sequences" of Unicode 3.1 [3]. [1] http://www.w3.org/XML/xml-V10-2e-errata#E27 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
