Re: [xml] htmlDocDumpMemory() vs xmlDocDumpMemory()

Daniel Veillard Mon, 23 Feb 2009 00:31:36 -0800

On Tue, Feb 17, 2009 at 12:29:02PM -0800, Rush Manbert wrote:
> I am processing XHTML source files, rendering them to HTML strings, then 
> loading the HTML string into a browser control (Webkit).
>
> Originally I was generating the string by calling xmlDocDumpMemory(),  
> but I kept reading articles that suggested you render as HTML if the  
> result is being displayed by a browser. I changed to use  
> htmlDocDumpMemory(), and my application still worked with no problems.
>
> Recently, however, we were developing a new set of web pages, and I had 
> occasion to load the HTML string output into a real browser (Safari), by 
> first writing the HTML string to a file, then opening the file in the 
> browser. To my surprise, the JavaScript error console displayed quite a 
> few errors. Many of them were complaints that the HTML contained element 
> pairs such as "<br></br>", or "<p></p>". Someone had asked be why we had 
> extra blank lines in the browser display, and I finally realized it was 
> because Safari was treating <br></br> as <br><br> (which is what the 
> error message said it would do).
>
> The source code in these cases contains <br />, <p />, etc. and I just  
> verified that if I call xmlDocDumpMemory() that is what ends up in the  
> output string. How can I achieve the same result using  
> htmlDocDumpMemory? Or is there some other way I should be doing this?


  From an XML parser <br /> and <br></br> are strictly equivalent (well
except for the Microsoft reader API which distinguishes the two but
should not), so if your broswer is loading the file with an XML parser
then the to forms are equivalent (BTW Safari is using libxml2 for XML
parsing so maybe someone can comment about this in more details ;-)

  Now an HTML parser should make no difference between <br /> and
<br>, that's why it's suggested to serialize XHTML that way.

  The behaviour you mention sounds like a bug in my opinion, <br />
should be safe for both kind of parsing, except if internally Safari
loads as XML , reserialize as <br></br> and then hands this to the
HTML parser, I don't see any other logical way to achieve what you got.

  Also not that by serializing to a file, you loose the mime-type
information, and the browser probably has to make guesses as whether
it should process this as XML or HTML, this probably doesn't help.

  For serialization use the new xmlSave* operations you have far more
flexibility than the old APIs you're using, see
  http://xmlsoft.org/html/libxml-xmlsave.html#xmlSaveOption

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
[email protected]  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] htmlDocDumpMemory() vs xmlDocDumpMemory()

Reply via email to