On Tue, Feb 17, 2009 at 12:29:02PM -0800, Rush Manbert wrote: > I am processing XHTML source files, rendering them to HTML strings, then > loading the HTML string into a browser control (Webkit). > > Originally I was generating the string by calling xmlDocDumpMemory(), > but I kept reading articles that suggested you render as HTML if the > result is being displayed by a browser. I changed to use > htmlDocDumpMemory(), and my application still worked with no problems. > > Recently, however, we were developing a new set of web pages, and I had > occasion to load the HTML string output into a real browser (Safari), by > first writing the HTML string to a file, then opening the file in the > browser. To my surprise, the JavaScript error console displayed quite a > few errors. Many of them were complaints that the HTML contained element > pairs such as "<br></br>", or "<p></p>". Someone had asked be why we had > extra blank lines in the browser display, and I finally realized it was > because Safari was treating <br></br> as <br><br> (which is what the > error message said it would do). > > The source code in these cases contains <br />, <p />, etc. and I just > verified that if I call xmlDocDumpMemory() that is what ends up in the > output string. How can I achieve the same result using > htmlDocDumpMemory? Or is there some other way I should be doing this?
From an XML parser <br /> and <br></br> are strictly equivalent (well except for the Microsoft reader API which distinguishes the two but should not), so if your broswer is loading the file with an XML parser then the to forms are equivalent (BTW Safari is using libxml2 for XML parsing so maybe someone can comment about this in more details ;-) Now an HTML parser should make no difference between <br /> and <br>, that's why it's suggested to serialize XHTML that way. The behaviour you mention sounds like a bug in my opinion, <br /> should be safe for both kind of parsing, except if internally Safari loads as XML , reserialize as <br></br> and then hands this to the HTML parser, I don't see any other logical way to achieve what you got. Also not that by serializing to a file, you loose the mime-type information, and the browser probably has to make guesses as whether it should process this as XML or HTML, this probably doesn't help. For serialization use the new xmlSave* operations you have far more flexibility than the old APIs you're using, see http://xmlsoft.org/html/libxml-xmlsave.html#xmlSaveOption Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [email protected] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
