On Tue, Nov 15, 2005 at 11:38:33AM +1100, Michael Day wrote: > > Hi Daniel, > > The invalid comment in wired.html is this: > > <!------TRADES---------> > > Because it has an odd number of "--" sequences the comment is actually not > terminated according to the SGML rules. > > Web browsers will actually parse this comment differently depending on > whether they are using standards-mode or quirks-mode to parse the > document. > > I have attached an HTML document that demonstrates the issue. If you open > it in Mozilla, it will be parsed in standards-mode because it has a > DOCTYPE declaration. In this case the comment will not be terminated and > some of the document text will be hidden. If you delete the DOCTYPE it > will be parsed in quirks-mode, the comment will be terminated and the text > will be shown. > > I cannot think of any way to detect comment termination that will handle > both cases correctly without adding a quirks-mode feature to the libxml > HTMLparser; there is no other way to parse old HTML and new HTML and get > them both right. > > Would it be reasonable for me to add a quirks-mode flag to the HTML parser > that would only toggle comment parsing behaviour for now?
I would just use the existing HTML_PARSE_RECOVER mode flag for this, though in a sense I would have preferred the default behaviour to be maintained. I really think that a wrong count number of '-' in comments is a frequent mistake and even if SGML suggest it is not ended we should not miss the start tag on the next element. This is a too benign error, and the effects are too strong with the new code, this feels unbalanced especially as it is a change from the current behaviour. I don't know how to best handle this... Daniel -- Daniel Veillard | Red Hat http://redhat.com/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
