Yes, thanks ! That sounds the right approach to me, I would just turn
merge that with a new htmlParserOption HTML_PARSE_STRICT, which could be
either passed by the user to maintain the current behaviour or activated by
default when the DOCTYPE is read if it happen to be a Strict HTML one.
Yes, checking the DTD is indeed an option; though I'm not sure how it
would handle case in which I link a DTD myself?
Eg.:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://very.silly/html401-like/but/not/exactly/strict.dtd">
Anyway, I do not see any reason why parser should mess with the document
in first place; it's supposed to parse it, not alter it deliberately
according to what it thinks that may be the right solution. Could
someone please explain me why to alter the document?
And please, do not say "to be compliant with standards", because
standards to my best knowledge do not require the parser to "fix" the
document (though I may be wrong, I doubt standards would require such a
thing) by adding tags in case it's not considered correct.
-- iSteve
PS.: The <p> tag injection is not correct anyway. "<img>" tag is inline,
yet, not wrapped into <p>. Still want to keep it?
For details, see: http://www.w3.org/TR/REC-html40/sgml/dtd.html#inline
'<!ENTITY % special "A | IMG | OBJECT | BR | SCRIPT | MAP | Q | SUB |
SUP | SPAN | BDO">
<!ENTITY % inline "#PCDATA | %fontstyle; | %phrase; | %special; |
%formctrl;">
<!ENTITY % block "P | %heading; | %list; | %preformatted; | DL | DIV |
NOSCRIPT | BLOCKQUOTE | FORM | HR | TABLE | FIELDSET | ADDRESS">'
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml