Hi there, Can you tell me whether libxml2 does complete validation of UTF-8 when input is provided in this character encoding? By complete validation I mean:
- Verifying that each character is represented by a byte sequence that matches one of the patterns described in section 3 of RFC 3629. - Verifying that each character is represented by the shortest possibly byte sequence (ruling out, for example the use of 0xC0 0x80 for U+0000). - Verifying that supplementary characters are represented by a 4-byte sequence, not by a pair of surrogate characters. - Verifying that illegal code points, such as the not-a-character characters, U+FFFE, U+FFFF, etc., do not occur. Bug report 305333 implies that some of this validation occurs, but the references to the obsolete RFC 2044 in the documentation worry me a bit. Thanks, Norbert _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
