Re: Streaming transformations for huge files?

Joseph Kesselman/Watson/IBM 21 May 2002 13:47:19 -0000

You may want to search the archives of this mailing list for use of the
word "pruning". We're very aware that there's an opportunity for
improvement in storage management. There are some challenges in finding a
clean way to implement that storage recovery given the characteristics of
our current data model (DTM, not DOM), and larger challenges in recognizing
when data really won't ever again be referenced (remember, XSLT allows
searching previous/parent axes, so in the general case we have to assume
that everything MAY have to be retained). Open area of research.


If you insist on trying to do this in today's Xalan: If you tell Xalan to
discard whitespace, it doesn't build those nodes into the data model; that
may reduce your storage somewhat. Definitely use SAX input; we're more
efficient when processing SAX than when reading from a DOM.  If your
problem was an extraction rather than an insertion, I'd recommend turning
on incremental model construction, which would allow us to stop building
the model once we've found what you're looking for (not always a help,
depending on where the data is in your document, but it at least improves
the odds).  If all else fails, most JVMs will let you raise you maximum
heap size, though they may not let you increase it enough to handle these
docs.

But until we have pruning working, I would recommend coding simple
insertions such as the one you describe at the SAX level rather than in
XSLT. Different tools are optimized for different tasks, and this isn't one
which Xalan is currently set up to handle well... though we agree that we
want it to do so in the future.

______________________________________
Joe Kesselman  / IBM Research

Re: Streaming transformations for huge files?

Reply via email to