Hi there,

Can you tell me whether libxml2 does complete validation of UTF-8  
when input is provided in this character encoding? By complete  
validation I mean:

- Verifying that each character is represented by a byte sequence  
that matches one of the patterns described in section 3 of RFC 3629.

- Verifying that each character is represented by the shortest  
possibly byte sequence (ruling out, for example the use of 0xC0 0x80  
for U+0000).

- Verifying that supplementary characters are represented by a 4-byte  
sequence, not by a pair of surrogate characters.

- Verifying that illegal code points, such as the not-a-character  
characters, U+FFFE, U+FFFF, etc., do not occur.

Bug report 305333 implies that some of this validation occurs, but  
the references to the obsolete RFC 2044 in the documentation worry me  
a bit.

Thanks,
Norbert

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to