[xml] libxml-HTMLparser.html

Sergio Monteiro Basto Thu, 15 Apr 2010 19:47:49 -0700

Hi,
As write here http://xmlsoft.org/html/libxml-HTMLparser.html
It should be able to parse "real world" HTML, even if severely broken from a 
specification point of view.
The example is based on http://www.voicenews.ca/


Using xmllint --html file 

with the input:
<table>  <tr><td><font size=1><a 
class=menu  href="1125.pdf">  1125.<tr><td><font size=1><a class=menu
href="1124.pdf">  1124.</table>

the output is:
<table><tr><td><font size="1"><a class="menu" href="1125.pdf">
1125.<tr><td><font size="1"><a class="menu" href="1124.pdf">  1124.
</a></font></td></tr></a></font></td></tr></table>

"<tr><td><font> <tr><td><font> <tr><td><font>
</font></td></tr> </font></td></tr> </font></td></tr>" is a wrong fix to
this HTML input.

The correct is clearly , 
"<tr><td><font> </font></td></tr> <tr><td><font> </font></td></tr>
<tr><td><font> </font></td></tr>"
is how HTML is render by any browser . 

> If you feel that this is a 
> bug in libxml2, please file a bug report there or report it on their 
> mailing list.

I follow yours tip in xmllint --html file 


Thanks,
-- 
Sérgio M. B.

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

[xml] libxml-HTMLparser.html

Reply via email to