Hi, I noticed a problem with the new way libxml2 2.6.29+ handles the HTML "embed" tag. It serialises it without the enclosing tag, which then lets following attempts to parse the document fail, as the information where the tag is closed gets lost. Here's an example:
$ cat embed.html <html><body> <embed src="http://www.youtube.com/v/183tVH1CZpA" type="application/x-shockwave-flash"></embed> <embed src="http://anothersite.com/v/another"></embed> <script src="http://www.youtube.com/example.js"></script> <script src="/something-else.js"></script> </body></html> $ xmllint --html embed.html > embed2.html $ cat embed2.html <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body> <embed src="http://www.youtube.com/v/183tVH1CZpA" type="application/x-shockwave-flash"><embed src="http://anothersite.com/v/another"><script src="http://www.youtube.com/example.js"></script><script src="/something-else.js"></script> </body></html> $ xmllint --html embed2.html > embed3.html $ cat embed3.html <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body> <embed src="http://www.youtube.com/v/183tVH1CZpA" type="application/x-shockwave-flash"><embed src="http://anothersite.com/v/another"><script src="http://www.youtube.com/example.js"></script><script src="/something-else.js"></script></embed></embed> </body></html> Note that the "script" tags have moved into the "embed" tag, although originally they were siblings. I think the place to fix this is the serialiser rather than the parser. It should always emit a closing tag here. Stefan _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
