On Tue, Apr 02, 2013 at 09:06:32AM +0200, Amandine Piguel wrote: > Hello, > > I would like to know if libxml2 is able to parse HTML5 files, and if > not, if it will be supported in the futur.
Bonjour, actually libxml2 is able to parse the html5, but using an html4 predefined set of markup declarations. As such it will generate element and attributes in the tree for syntax it doesn't know but it cannot do specific handling if needed. > In fact, I already tried to load pure HTML5 document user the > HTMLparser libxml is providing. I am getting error such as : "Tag > section invalid", "Tag header invalid", "Tag article invalid", "Tag > output invalid", ... It seems to be related to all HTML5 specific > tags, the ones that were not existing in HTML4 and appreared in > HTML5. you should get a resulting tree, those are more like warnings than fatal errors, but it is true libxml2 should be extended to at least not complain on the new syntactic constructs of HTML5. > Do you intend to provide the support of these tag in the HTML parser ? I'm not sure I would have time in the near future to do those additions, but I definitely take patches ! In the meantime you can catch those specific errors and discard them. Since HTML 5 is no in Candidate REC at W3C I hope someone will have the time to help on fixing this in the next months, Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veill...@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml