Hi, Sorry for not catching this in the first place.
On Mon, Feb 23, 2009 at 9:31 AM, Daniel Veillard <[email protected]> wrote: > On Tue, Feb 17, 2009 at 12:29:02PM -0800, Rush Manbert wrote: >> I am processing XHTML source files, rendering them to HTML strings, then >> loading the HTML string into a browser control (Webkit). Rendering XHTML files as HTML is asking for trouble. HTML has some quirks that will make your XHTML page render strangely. >> Originally I was generating the string by calling xmlDocDumpMemory(), >> but I kept reading articles that suggested you render as HTML if the >> result is being displayed by a browser. I changed to use >> htmlDocDumpMemory(), and my application still worked with no problems. >> >> Recently, however, we were developing a new set of web pages, and I had >> occasion to load the HTML string output into a real browser (Safari), by >> first writing the HTML string to a file, then opening the file in the >> browser. To my surprise, the JavaScript error console displayed quite a >> few errors. Many of them were complaints that the HTML contained element >> pairs such as "<br></br>", or "<p></p>". Someone had asked be why we had >> extra blank lines in the browser display, and I finally realized it was >> because Safari was treating <br></br> as <br><br> (which is what the >> error message said it would do). I had a look at our HTML parser and it seems that in quirks mode, </br> is interpreted as <br> as you were reporting (just check the comment at http://trac.webkit.org/browser/trunk/WebCore/html/HTMLParser.cpp#L204). So it is not a bug but a compatibility quirk (provided you are indeed in quirks mode). I think the complain about <p></p> is an overzealous check for </p> with unmatched <p> (again in quirks mode) but I may be wrong here. Rush, have you specified a doctype in your html file? Have you checked how other browsers behave? > From an XML parser <br /> and <br></br> are strictly equivalent (well > except for the Microsoft reader API which distinguishes the two but > should not), so if your broswer is loading the file with an XML parser > then the to forms are equivalent (BTW Safari is using libxml2 for XML > parsing so maybe someone can comment about this in more details ;-) Sure :-) WebKit is using libxml2's SAX callbacks. Both forms should lead to the same callbacks' sequence and thus will result in the same element been created. I have tried this and it is the case in Safari 3.2.1. > Now an HTML parser should make no difference between <br /> and > <br>, that's why it's suggested to serialize XHTML that way. > > The behaviour you mention sounds like a bug in my opinion, <br /> > should be safe for both kind of parsing, except if internally Safari > loads as XML , reserialize as <br></br> and then hands this to the > HTML parser, I don't see any other logical way to achieve what you got. No, we avoid moving documents from one parser to another. We determine the document type using different methods (content-type header, extension ...) and then use either the XML parser that uses libxml2 or our own HTML parser. Regards, Julien _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
