On Thu, Sep 22, 2011 at 11:52:16AM -0700, Alan Hogan wrote: > According to HTML 4†, HTML5‡, and all the browsers I have tested* (including > Firefox, IE7/8/9, Chrome, Safari, Opera, Android, iOS): > - No <table> should be without a <tbody>. > > - No <tr> should exist outside of a <thead>, <tfoot>, and <tbody>. > > - The first <tr> encountered in a <table>, if not within a <thead> or > <tfoot>, > and if no <tbody> was manually defined, implies that a <tbody> element was > just created as well (as the parent of this and all subsequest <tr>s). > > LibXML, however, seems happy to parse <tr> elements as if they were direct > children of a table. > This is simply wrong, nonstandard, and incompatible with user agents. > > It is creating a headache for me because CSS / XPath selections will not act > as expected, and in an asymmetrical way with regards to actual users' > browsers. > > Can we get this to be considered a bug? > > After all, it’s not that the document author declared there was no > tbody. Wittingly or no, they implied its presence; LibXML is simply failing > to > make the correct inference.
The big problem is that when you start making inferences like that you do change the document. In some basic cases it's rather hard to go wrong, but real world HTML is not about basic cases it's about an ocean of broken HTML in all possible ways. Even something as simple as implying <body> get nasty really fast, assume a document start with <p> , you would think per the rules you can add implicit <html><body> ... well until you hit <p>blah <title>foo yes that's wrong, yes it exists, engines will parse and render this silently. And no I won't try to fix it, maybe <p> was added by some broken customization layer, maybe the beginner who typed this though title was a good substicture for h1. If libxml2 start doing this it will put policies on how to handle brokeness, and since it's a library it's the wrong place to put this in. For the browser, they are mostly end application so it's fine for them to implement policies, for libxml2 as a building block, we can't. Now for tables that even more complex. Sometimes the best at the parser level is to just *parse* and let the interpetation of the result to the application, because if you try to interpret based on the specification, well in real HTML you're garanteed to blow up one way or another. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml