On Tue, Apr 22, 2008 at 12:18:20PM -0400, Daniel Veillard wrote:
On Tue, Apr 22, 2008 at 03:56:33PM +0200, Arnold Hendriks wrote:
Daniel Veillard wrote:
I think the embedding error condition should be noted somewhere in the
parser state and disable at least partially the closing tag processing so
that the 'end text' paragraph shows up as a sibling of the 'embbeded text'
paragraph.
It probably should generate an error, yes. My patch simply ignores the
situtation.
but break the normal cases, which is not acceptable, nice try ;-)
Proper patch, reusing ctxt-depth which is not used in the HTML parser
yet to count the number of times an opening tag has been ignored, and
reused to drop the closing tags. Of course extra or missing ending tags
are still possible, but at this point one can only do heuristics. Works
properly for me, will commit soonish unless i hear a good reason against it
in the meantime:
wei:~/XML - ./xmllint --html autoskip.html
autoskip.html:3: HTML parser error : htmlParseStartTag: misplaced html tag
html xml:lang=en xmlns=foobar
^
autoskip.html:4: HTML parser error : htmlParseStartTag: misplaced body tag
body
^
!DOCTYPE html PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN
http://www.w3.org/TR/REC-html40/loose.dtd;
htmlbody
psome text
/p
pembbeded text/p
pend text
/p
/body/html
wei:~/XML -
Daniel
--
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard | virtualization library http://libvirt.org/
[EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
Index: HTMLparser.c
===
--- HTMLparser.c(revision 3739)
+++ HTMLparser.c(working copy)
@@ -3482,6 +3482,7 @@ htmlParseStartTag(htmlParserCtxtPtr ctxt
htmlParseStartTag: misplaced html tag\n,
name, NULL);
discardtag = 1;
+ ctxt-depth++;
}
if ((ctxt-nameNr != 1)
(xmlStrEqual(name, BAD_CASThead))) {
@@ -3489,6 +3490,7 @@ htmlParseStartTag(htmlParserCtxtPtr ctxt
htmlParseStartTag: misplaced head tag\n,
name, NULL);
discardtag = 1;
+ ctxt-depth++;
}
if (xmlStrEqual(name, BAD_CASTbody)) {
int indx;
@@ -3498,6 +3500,7 @@ htmlParseStartTag(htmlParserCtxtPtr ctxt
htmlParseStartTag: misplaced body tag\n,
name, NULL);
discardtag = 1;
+ ctxt-depth++;
}
}
}
@@ -3648,7 +3651,6 @@ htmlParseEndTag(htmlParserCtxtPtr ctxt)
name = htmlParseHTMLName(ctxt);
if (name == NULL)
return (0);
-
/*
* We should definitely be at the ending S? '' part
*/
@@ -3669,6 +3671,18 @@ htmlParseEndTag(htmlParserCtxtPtr ctxt)
NEXT;
/*
+ * if we ignored misplaced tags in htmlParseStartTag don't pop them
+ * out now.
+ */
+if ((ctxt-depth 0)
+(xmlStrEqual(name, BAD_CAST html) ||
+ xmlStrEqual(name, BAD_CAST body) ||
+xmlStrEqual(name, BAD_CAST head))) {
+ ctxt-depth--;
+ return (0);
+}
+
+/*
* If the name read is not one of the element in the parsing stack
* then return, it's just an error.
*/
___
xml mailing list, project page http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml