I'm also for HTML5 support. My only hope is that you guys can also speed up
the parsing time and reduce the memory usage when parsing HTML documents.
Don't get me wrong libxml2 is fast but making even faster would be great :)


On Tue, Apr 2, 2013 at 6:51 PM, Liam R E Quin <l...@holoweb.net> wrote:

> On Tue, 2013-04-02 at 09:06 +0200, Amandine Piguel wrote:
> > Hello,
> >
> > I would like to know if libxml2 is able to parse HTML5 files, and if
> > not, if it will be supported in the futur.
>
> Note, libxml's HTML parser is really good at making sense of HTML input,
> but it is not a formal HTML parser - the tree you get is not guaranteed
> to be the same as the one a Web browser would make, and even with HTML 4
> there are differences, e.g. in when a "tbody" element is inferred. This
> isn't a bad thing - often it's exactly what you want.
>
> I'd guess that patches to provide an option to use the HTML 5 parsing
> algorithm would be plausible.
>
> Example: try the following input, and compare with a Web browser in the
> DOM...
>
> <body>
>     <table><th>a</th><td>b</td>
> </body>
>
> Again, this isn't saying anything bad about libxml - I'm trying to give
> examples so you can understand what it's doing. I don't actually know of
> a good HTML 5 parser that can replace libxml2; I don't follow these
> things, and in any case I'd rather see it folded into libxml2 in some
> way I think.
>
> Liam
>
> --
> Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
> Pictures from old books: http://fromoldbooks.org/
> Ankh: irc.sorcery.net irc.gnome.org freenode/#xml
>
> _______________________________________________
> xml mailing list, project page  http://xmlsoft.org/
> xml@gnome.org
> https://mail.gnome.org/mailman/listinfo/xml
>
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml

Reply via email to