On Mon, Jan 09, 2006 at 02:44:34PM +0100, iSteve wrote:
> Greetings,
> for the past week, I've been fixing various bugs in gtkhtml2. Recently,
> I've found an issue that I -- hopefully correctly -- traced back to
> libxml2's HTML parser.
>
> When parsing a html such as:
> <html><body> xxx <div>aaa</div> yyy <div>bbb</div> zzz </body></html>
>
> I get the 'xxx', 'yyy' and 'zzz' wrapped into paragraphs ("p" element,
> eg. "[...]<body><p>xxx</p><div>[...]).
>
> The html:
> <html><body>some <img src="foo.bar"> text</body></html>
> turns into:
> <html><body><p>some <img src="foo.bar"> text</p></body></html>
>
> The reason is apparently that each text should be in it's own block;
> unfortunately, wrapping them right into paragraph elements has quite a
> few drawbacks:
>
> a) During later processing, eg. a stylesheet may (and in fact does)
> get applied to the "p" element; imagine, for example, having a
> background-image set for all <p>, and you'll suddenly see it even where
> it shouldn't be at all... It may therefore also break rendering of eg.
> float (please find the two attached test HTMLs, one without "p"
> elements, one with them).
>
> b) It doesn't appear to be compliant with the standard either; at
> least I didn't find any such such in the HTML 4.01 standard.
>
> c) I have no idea why does the text go into <p> in the second example,
> too...
The spec for body is at :
http://www.w3.org/TR/REC-html40/struct/global.html#h-7.5.1
<!ELEMENT BODY O O (%block;|SCRIPT)+ +(INS|DEL) -- document body -->
I'm not sure text nodes are to be accepted directly as child of a body element
For div, it seems adding the <p> is superfluous
http://www.w3.org/TR/REC-html40/struct/global.html#edef-DIV
<!ELEMENT DIV - - (%flow;)* -- generic language/style container -->
> I do not believe that wrapping the text into paragraph (which, I
> believe, is performed by htmlCheckParagraph()) is the best way; perhaps
> setting the tag name to eg. NULL instead, or a zero-size string (as a
element with no name or element with empty names would break so much
code assuming a correct that nothing could justify such a hack, sorry !!!
> special value) would be a better way to resolve the point a) and b). If
> no styling and rendering would be applied to the reported block (by the
> forementioned fix), it would imply that c) would no longer matter
> anyway, too.
Daniel
--
Daniel Veillard | Red Hat http://redhat.com/
[EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml