On Fri, 2011-09-09 at 10:21 -0400, Jason Viers wrote: > On 9/9/2011 05:37, Murray Cumming wrote: > > Here is a simple test case that takes the text from an apparently-valid > > UTF-8 file > > Not all valid UTF-8 is valid in XML. Only a subset, as defined in > http://www.w3.org/TR/2008/REC-xml-20081126/#charsets > > Note that Form Feed (0xC) is not allowed. Your original input document > contains a formfeed character, and this is what ends up being invalid. > It's not a matter of escaping; form feed as a literal byte, numeric > reference, etc., is not allowed. > Stripping the form feed from the input allows it to serialize properly.
Ah, I didn't know that it couldn't be there even if escaped. Thanks. Shouldn't libxml warn about that at the same time that it would escape characters such as & and < rather than writing invalid XML? -- murr...@murrayc.com www.murrayc.com www.openismus.com _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml