On 9/9/2011 05:37, Murray Cumming wrote:
Here is a simple test case that takes the text from an apparently-valid
UTF-8 file

Not all valid UTF-8 is valid in XML.  Only a subset, as defined in
http://www.w3.org/TR/2008/REC-xml-20081126/#charsets

Note that Form Feed (0xC) is not allowed. Your original input document contains a formfeed character, and this is what ends up being invalid. It's not a matter of escaping; form feed as a literal byte, numeric reference, etc., is not allowed.
Stripping the form feed from the input allows it to serialize properly.

Jason

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to