On Wed, 2011-09-14 at 16:10 +0800, Daniel Veillard wrote:
> On Fri, Sep 09, 2011 at 04:30:45PM +0200, Murray Cumming wrote:
> > On Fri, 2011-09-09 at 10:21 -0400, Jason Viers wrote:
> > > On 9/9/2011 05:37, Murray Cumming wrote:
> > > > Here is a simple test case that takes the text from an apparently-valid
> > > > UTF-8 file
> > > 
> > > Not all valid UTF-8 is valid in XML.  Only a subset, as defined in
> > > http://www.w3.org/TR/2008/REC-xml-20081126/#charsets
> > > 
> > > Note that Form Feed (0xC) is not allowed.  Your original input document 
> > > contains a formfeed character, and this is what ends up being invalid.  
> > > It's not a matter of escaping; form feed as a literal byte, numeric 
> > > reference, etc., is not allowed.
> > > Stripping the form feed from the input allows it to serialize properly.
> > 
> > Ah, I didn't know that it couldn't be there even if escaped. Thanks.
> > 
> > Shouldn't libxml warn about that at the same time that it would escape
> > characters such as & and < rather than writing invalid XML?
> 
>   It's a choice, either you make all APIs validate all input strings
> or you rely on the client to do it. In libxml2 I took the second path
> and that was decided 10+ years ago. The parser on the other hand is
> strict but that's mandatory to follow the spec.

OK. Thanks. Is that documented?

-- 
murr...@murrayc.com
www.murrayc.com
www.openismus.com

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to