Daniel Veillard wrote:
> I didn't forgot about the issue, and got a bit of time to test yesterday
> and look at it. First the patch makes senses it fixes a serious problem,
> there is no leak, that's fine, but the result is still problematic
>
>
> </body>
> </html>
> <p>end text
> </body></html>
>
>
> Basically the error is correctly displayed, but the close of the embedded
> body and html tags generate a serious mess. We are able to detect the
> embedding
> but the autoclose kind of misbehaves. moreover if using the push parser the
> autoclose ends the document immediately:
>
Can I cheat? :) Given the fact that nothing should appear between
</body> and </html>, and </html> is always the last tag, its' easiest to
just ignore them and let the autoclose deal with it...
vz202:~/libxml2/trunk # svn diff HTMLparser.c
Index: HTMLparser.c
===================================================================
--- HTMLparser.c (revision 3739)
+++ HTMLparser.c (working copy)
@@ -3646,7 +3646,9 @@
SKIP(2);
name = htmlParseHTMLName(ctxt);
- if (name == NULL)
+ if (name == NULL
+ || xmlStrEqual(name, BAD_CAST "html")
+ || xmlStrEqual(name, BAD_CAST "body") )
return (0);
/*
With this patch, I get:
<html xml:lang="en" xmlns="foobar">
^
autoskip.html:4: HTML parser error : htmlParseStartTag: misplaced <body> tag
<body>
^
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<p>some text
</p>
<p>embbeded text</p>
>
>
<p>end text
>>
</p>
</body></html>
Which looks good enough to me. It's probably at least enough to get it
properly through my html email sanitizer.
> I think the embedding error condition should be noted somewhere in the
> parser state and disable at least partially the closing tag processing so
> that the 'end text' paragraph shows up as a sibling of the 'embbeded text'
> paragraph.
>
It probably should generate an error, yes. My patch simply ignores the
situtation.
--
Arnold Hendriks <[EMAIL PROTECTED]>
B-Lex Information Technologies <http://www.b-lex.com/>
Postbus 545, 7500 AM Enschede, The Netherlands
B-Lex: +31 (0)53 4836543
Mobile: +31 (0)6 51710159
MSN: [EMAIL PROTECTED]
ICQ: 86313731
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml