On Tue, 2013-04-02 at 09:06 +0200, Amandine Piguel wrote: > Hello, > > I would like to know if libxml2 is able to parse HTML5 files, and if > not, if it will be supported in the futur.
Note, libxml's HTML parser is really good at making sense of HTML input, but it is not a formal HTML parser - the tree you get is not guaranteed to be the same as the one a Web browser would make, and even with HTML 4 there are differences, e.g. in when a "tbody" element is inferred. This isn't a bad thing - often it's exactly what you want. I'd guess that patches to provide an option to use the HTML 5 parsing algorithm would be plausible. Example: try the following input, and compare with a Web browser in the DOM... <body> <table><th>a</th><td>b</td> </body> Again, this isn't saying anything bad about libxml - I'm trying to give examples so you can understand what it's doing. I don't actually know of a good HTML 5 parser that can replace libxml2; I don't follow these things, and in any case I'd rather see it folded into libxml2 in some way I think. Liam -- Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/ Pictures from old books: http://fromoldbooks.org/ Ankh: irc.sorcery.net irc.gnome.org freenode/#xml _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml