> >When using SAXSource there transformation time depends greatly on the
> >size of the text in the transformed documents (i.e. the cumulative
> >size of the char arrays passed in the SAX characters callback). when
> >using DOMSource there is little difference in the transformation
> >timings regardless of the cumulative characters number.
> 
> Possible explanation:
> 
> SAX has no persistant memory buffer for character content of nodes,
> which means we have to copy the text into our own data structures.
> (See SAX2DTM.characters(), for example). We try to minimize the
> overhead of doing so, via our use of the FastStringBuffer class, but
> the block-copy cost is an unavoidable part of accepting SAX data into
> a persistant model.
> 
> If you pass us a DOM, our DOM2DTM adapter attempts to leverage the DOM
> as much as possible. In particular, it leaves text data in the DOM and
> retrieves it when called for, rather than copying that data into our
> data structures.
> 
> (Even so, DOM2DTM replicates much more of the DOM structure than I'm
> really happy with. DOM2DTM2 sketched an alternative, though I'm not
> really happy with it; I think XDM running directly over DOM is going
> to be the way to go in the long run.)
> 

That being the case, is there something to be done in the short term, 
or should I just settle for the DOM usage? The scenario we are using 
consists of taking an HTML, forcing it into XML (by artificially 
making it a proper XML) and then XSLing it. Since our "xmlizing" 
method requires the whole structure to be kept in memory (so we'll 
know which tags to close) using DOM rather than streaming the 
document through SAX (into Xalan) is not that much of a compromise.


Cheers,
Shmul


Reply via email to