> In some cases, we can also > do a better job with incremental transforms when using Xerces > specifically as a parser, since one of the classes takes advantage of > Xerces-specific features.
The main advantage is that Xerces is specifically designed so it can function as an incremental parser. Using other parsers in incremental mode requires a multiple-thread handshaking solution which adds some overhead to the process. > Note that if you feed us a DOMSource as the XML document, we don't > bother to use incremental since it's all in memory already. For what it's worth: DOM2DTM is always built incrementally... but of course relies on the DOM being entirely present (or appearing to be; it could be incremental itself as long as that's hidden behind the DOM APIs). Note, however, that both are built in document order -- SAX2DTM because that's how SAX presents the info, DOM2DTM because that was the simplest way to overcome the difficulty of mapping an arbitrary DOM node to an integer. DOM2DTM might be improved for some implementations of the DOM, such as the Xerces DOM, which have additional information available; that's a pending project. > In terms of only keeping the minimum you need in memory when using SAX, > we're still working on that. One technique is called pruning, where > you periodically delete the DTM nodes of the XML from memory after you > know you don't need them anymore in the transformation process. The > problem is knowing when you no longer need the nodes... There's some discussion of the opportunities, and issues, deep in the archives of this mailing list. We're actually doing some limited "tail-pruning" now, in our low-level handling of Result Tree Fragments. It might not be very hard to generalize that. The problem, as always, is finding time to develop and refine that code. ______________________________________ Joe Kesselman / IBM Research
