For the time being, we're doing pretty much everything we can to minimize
memory usage. Because we construct objects, there is only so much we can
do. There still is some work to do with optimizing allocations for
strings, but that will not be done for a while.
We already do a great deal to cut down on memory usage. For example:
1. Strings are pooled. There's one pool for names of nodes, and one
pool for values (values of attributes, text nodes, etc.)
2. Nodes are allocated in blocks to increase locality of reference and
to minimize the burden on the runtime heap manager.
The only improvement to be made with the current implementation is to
allocate the memory for string data in blocks. Currently, the data for
each string (althought not the string itself), is allocated from the
global heap.
The Java folks are working on a "handle" implementation, where nodes and
their data are represented by integers. If this turns out to work well,
we'll probably adopt a similar strategy in the C++ processor. The drawback
is that it will require significant re-working of the code. There are also
some limitations and tradeoff required by this approach, so we'll have to
carefully consider those when we decide what to do. One big advantage,
besides potentially reduced storage requirements, is the ability to page
the document in and out of memory by keeping parts of it on disk. You
can't really do that when you have objects. You could also imagine that
you might have multiple implementations of the handle-based approach, using
different sizes of ints, allowing for varying capacities. For small
documents, I could choose an implementation based on 16-bit values. For
large ones, 32-bit values, and on 64-bit platforms, even 64-bit values.
Finally, there's the Holy Grail of stream processing, where no tree is ever
built, or a model where we do some processing, then throw away a section of
the tree because we know we're done with it. It will certainly take a
while to get to that point.
Dave
"Mark Northcott"
<mnorthcott@datab To: <[EMAIL PROTECTED]>
eacon.com> cc: (bcc: David N Bertoni/CAM/Lotus)
Subject: XalanSourceTree memory usage
07/26/2001 12:10
PM
Please respond to
xalan-dev
I am currently looking into the amount of memory being used when Xalan
creates a XalanSourceTree from the InputSource when processing a
transformation. The application I am using Xalan within has the
potential for very large XML files (100+ MB) needing to transformed, and
based upon tests I have run, it appears that the tree built within Xalan
uses about as much memory as the actual size of the XML file. Is there
any way of getting around this and limiting the amount of memory
required?