Hi,

I noticed a problem with the new way libxml2 2.6.29+ handles the HTML "embed"
tag. It serialises it without the enclosing tag, which then lets following
attempts to parse the document fail, as the information where the tag is
closed gets lost. Here's an example:

$ cat embed.html
<html><body>
<embed src="http://www.youtube.com/v/183tVH1CZpA";
type="application/x-shockwave-flash"></embed>
<embed src="http://anothersite.com/v/another";></embed>
<script src="http://www.youtube.com/example.js";></script>
<script src="/something-else.js"></script>
</body></html>

$ xmllint --html embed.html > embed2.html

$ cat embed2.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd";>
<html><body>
<embed src="http://www.youtube.com/v/183tVH1CZpA";
type="application/x-shockwave-flash"><embed
src="http://anothersite.com/v/another";><script
src="http://www.youtube.com/example.js";></script><script
src="/something-else.js"></script>
</body></html>

$ xmllint --html embed2.html > embed3.html

$ cat embed3.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd";>
<html><body>
<embed src="http://www.youtube.com/v/183tVH1CZpA";
type="application/x-shockwave-flash"><embed
src="http://anothersite.com/v/another";><script
src="http://www.youtube.com/example.js";></script><script
src="/something-else.js"></script></embed></embed>
</body></html>

Note that the "script" tags have moved into the "embed" tag, although
originally they were siblings.

I think the place to fix this is the serialiser rather than the parser. It
should always emit a closing tag here.

Stefan

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to