On Tue, Sep 04, 2007 at 10:19:18AM -0400, Liam R E Quin wrote:
> On Tue, 2007-09-04 at 07:01 -0400, Daniel Veillard wrote:
> > On Tue, Sep 04, 2007 at 06:39:01AM -0300, Bruno Dilly wrote:
> > > Hi people,
> > > 
> > > I'm trying to parse RSS with html entities, but I'm having the
> > > following errors when it tries to parse the rss file:
> > > Entity 'ntilde' not defined;
> > > Entity 'iacute' not defined;
> [...]
> >   are the HTML entities defined in the RSS DTD ? if yes then you
> > need to ask to load the DTD. If no, then using them there is an error.
> 
> It's worse than that :-)
> 
> RSS requires HTML markup to be escaped in descriptions, so you have
> to write things like
>     ñ
> and the same for elements, &lt;i&gt;...&lt;/i&gt; to get <i>...</i> into
> an RSS feed.
> 
> A lot of RSS feeds are invalid.

  By default libxml2 XML parser does not fetch the external subset of DTDs.
So the undeclared entities are an error but not a fatal error (since of
course they never set standalone="yes"). The fact they are invalid doesn't
bother me that much, but it seems in practice they are often not well-formed
and a lot of RSS readers don't use XML parsers to work around this. One
thing is sure, I won't encourage this trend...

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to