Fw: Trouble exporting HTML from a DOM in memory

Brian Minchau Thu, 17 Apr 2008 08:58:22 -0700

Hi Jenny.

Yes, Henry is right.

I don't know how I missed what your wrote:
> which results in browser bombs, and starts with:
> <HTML xmlns="http://www.w3.org/1999/xhtml"; lang="en">

That default namespace forces this HTML element to be treated as XML.
Likewise for any other element that is in a non-null namespace.

- Brian

----- Forwarded by Brian Minchau/Toronto/IBM on 04/17/2008 11:54 AM -----

             Henry                                                         
             Zongaro/Toronto/I                                             
             [EMAIL PROTECTED]                                                  
 To 
                                       "Jenny Brown" <[EMAIL PROTECTED]>   
             04/17/2008 10:50                                           cc 
             AM                        xalan-j-users@xml.apache.org        
                                                                   Subject 
                                       Re: Trouble exporting HTML from a   
                                       DOM in memory                       

Hi, Jenny.

"Jenny Brown" <[EMAIL PROTECTED]> wrote on 2008-04-16 09:27:44 PM:
> The main situation I'm having trouble with is empty tags.  For
> instance... my input file contains:
> <P>This is some <STRONG></STRONG> paragraph text.</P>
> <P>This is a textarea.  <TEXTAREA name="foo"></TEXTAREA>  It has text
> after it.</P>
>
> It gets into my in-memory dom tree okay.  But then when I try to use a
> transformer to output the html, instead I get this which Firefox
> chokes on:
> <P>This is some <STRONG/> paragraph text.</P>
> <P>This is a textarea.  <TEXTAREA name="foo"/> It has text after it.</P>
>
> [Snip]
>
> Transformer transformer =
TransformerFactory.newInstance().newTransformer();
> transformer.setOutputProperty(OutputKeys.METHOD, "html");
> transformer.setOutputProperty(OutputKeys.MEDIA_TYPE, "text/html");
> transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
> transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
>
> [Snip]
>
> So, I'm trying to tell it to give me html, but what I get is a
> document that contains xml-like empty tags wherever the tag was empty,
> which results in browser bombs, and starts with:
> <HTML xmlns="http://www.w3.org/1999/xhtml"; lang="en">

I think this is the key.  You have specified that you want to use the html
output method, but your output is really xhtml.  Because your output is in
an XML namespace, the serializer is required to serialize the output as
XML, despite the fact that you've used the html output method.  However,
XHTML has to adhere to certain lexical conventions in order to be correctly
displayed in a browser that ordinary XML does not have to adhere to.

XSLT 1.0 does not define an xhtml output method, but Xalan-J does allow you
to give it a clue that what you're serializing is really XHTML.  If you add
the following output property, the serializer will emit empty tags using a
space before the trailing /> - thus, <STRONG />

transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "-//W3C//DTD XHTML
1.0 Transitional//EN");

That will probably help with a tag like <br> which is always supposed to be
empty - it will be serialized as <br /> - but probably not with STRONG and
TEXTAREA which happen to have no content in your DOM tree, but ordinarily
would have content.  They really should be serialized as <STRONG></STRONG>
rather than <STRONG />.  This issue has previously been reported as JIra
issue XALANJ-1906.[1]

In the meanwhile, you probably have a couple of options for working around
this issue:  one would be recreate the DOM tree using elements that are in
no namespace rather than being in the XHTML namespace - then the html
output method would work properly; another would be search the DOM tree
looking for elements that ordinarily have content that are actually empty,
and give them a single whitespace node child or remove them from the tree
entirely.  You could also write XSLT stylesheets to implement any of those
work-arounds; let us know if you'd like an example.

Thanks,

Henry
[1] http://issues.apache.org/jira/browse/XALANJ-1906
------------------------------------------------------------------
Henry Zongaro
XML Transformation & Query Development
IBM Toronto Lab   T/L 313-6044;  Phone +1 905 413-6044
mailto:[EMAIL PROTECTED]

Fw: Trouble exporting HTML from a DOM in memory

Reply via email to