Hi, As write here http://xmlsoft.org/html/libxml-HTMLparser.html It should be able to parse "real world" HTML, even if severely broken from a specification point of view. The example is based on http://www.voicenews.ca/
Using xmllint --html file with the input: <table> <tr><td><font size=1><a class=menu href="1125.pdf"> 1125.<tr><td><font size=1><a class=menu href="1124.pdf"> 1124.</table> the output is: <table><tr><td><font size="1"><a class="menu" href="1125.pdf"> 1125.<tr><td><font size="1"><a class="menu" href="1124.pdf"> 1124. </a></font></td></tr></a></font></td></tr></table> "<tr><td><font> <tr><td><font> <tr><td><font> </font></td></tr> </font></td></tr> </font></td></tr>" is a wrong fix to this HTML input. The correct is clearly , "<tr><td><font> </font></td></tr> <tr><td><font> </font></td></tr> <tr><td><font> </font></td></tr>" is how HTML is render by any browser . > If you feel that this is a > bug in libxml2, please file a bug report there or report it on their > mailing list. I follow yours tip in xmllint --html file Thanks, -- Sérgio M. B.
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
