DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=24579>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=24579

[XML 1.0] - E27: Must reject non-shortest forms in UTF-8

           Summary: [XML 1.0] - E27: Must reject non-shortest forms in UTF-8
           Product: Xerces2-J
           Version: 2.5.0
          Platform: All
               URL: http://www.w3.org/XML/xml-V10-2e-errata#E27
        OS/Version: All
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: Other
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


E27 [1] states that "it is a fatal error if an entity encoded in UTF-8 contains 
any irregular code unit sequences, as defined in Unicode 3.1".  I had a look at 
this errata sometime ago, and in addition to irregular code unit sequences 
being a fatal error, we should also reject non-shortest forms. These non-
shortest forms (such as C0 80 or E0 80 80
which both correspond to codepoint 0), are not legal in Unicode 3.1. See "UTF-8 
Corrigendum" and "Table 3.1B. Legal UTF-8 Byte Sequences" of Unicode 3.1 [3].

[1] http://www.w3.org/XML/xml-V10-2e-errata#E27

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to