Re: [xml] htmlDocDumpMemory() vs xmlDocDumpMemory()

Julien Chaffraix Mon, 23 Feb 2009 02:25:32 -0800

Hi,

Sorry for not catching this in the first place.

On Mon, Feb 23, 2009 at 9:31 AM, Daniel Veillard <[email protected]> wrote:
> On Tue, Feb 17, 2009 at 12:29:02PM -0800, Rush Manbert wrote:
>> I am processing XHTML source files, rendering them to HTML strings, then
>> loading the HTML string into a browser control (Webkit).

Rendering XHTML files as HTML is asking for trouble. HTML has some
quirks that will make your XHTML page render strangely.

>> Originally I was generating the string by calling xmlDocDumpMemory(),
>> but I kept reading articles that suggested you render as HTML if the
>> result is being displayed by a browser. I changed to use
>> htmlDocDumpMemory(), and my application still worked with no problems.
>>
>> Recently, however, we were developing a new set of web pages, and I had
>> occasion to load the HTML string output into a real browser (Safari), by
>> first writing the HTML string to a file, then opening the file in the
>> browser. To my surprise, the JavaScript error console displayed quite a
>> few errors. Many of them were complaints that the HTML contained element
>> pairs such as "<br></br>", or "<p></p>". Someone had asked be why we had
>> extra blank lines in the browser display, and I finally realized it was
>> because Safari was treating <br></br> as <br><br> (which is what the
>> error message said it would do).

I had a look at our HTML parser and it seems that in quirks mode,
</br> is interpreted as <br> as you were reporting
(just check the comment at
http://trac.webkit.org/browser/trunk/WebCore/html/HTMLParser.cpp#L204).
So it is not a bug but a compatibility quirk (provided you are indeed
in quirks mode). I think the complain about <p></p> is an overzealous
check for </p> with unmatched <p> (again in quirks mode) but I may be
wrong here.

Rush, have you specified a doctype in your html file? Have you checked
how other browsers behave?

>  From an XML parser <br /> and <br></br> are strictly equivalent (well
> except for the Microsoft reader API which distinguishes the two but
> should not), so if your broswer is loading the file with an XML parser
> then the to forms are equivalent (BTW Safari is using libxml2 for XML
> parsing so maybe someone can comment about this in more details ;-)

Sure :-)

WebKit is using libxml2's SAX callbacks. Both forms should lead to the
same callbacks' sequence and thus will result in the same element been
created. I have tried this and it is the case in Safari 3.2.1.

>  Now an HTML parser should make no difference between <br /> and
> <br>, that's why it's suggested to serialize XHTML that way.
>
>  The behaviour you mention sounds like a bug in my opinion, <br />
> should be safe for both kind of parsing, except if internally Safari
> loads as XML , reserialize as <br></br> and then hands this to the
> HTML parser, I don't see any other logical way to achieve what you got.

No, we avoid moving documents from one parser to another. We determine
the document type using different methods (content-type header,
extension ...) and then use either the XML parser that uses libxml2 or
our own HTML parser.

Regards,
Julien
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] htmlDocDumpMemory() vs xmlDocDumpMemory()

Reply via email to