> I would like to adapt a library that I am using that uses XPath from 
> Xalan.  At the moment the library reads very large XML files entirely in 

> to memory.  The files that I am using contains either a BLOB or a very 
> structured xml table as one of the elements.

Large files will always consume a great deal of memory, even with the 
default source tree implementation, which is about as efficient as it can 
be.

> I would like to prevent the data from part of the tree being loaded in 
to 
> memory.  Is this possible and how is it done?

Unfortunately, because XPath allows random access to the tree, this is a 
very difficult problem to solve.

> I would also like to be able to obtain a pointer to the place in the 
file 
> where this tag starts, so I can use a non-DOM method to read this data.

I'm not sure what you're looking for here.  There's a lot the parser does 
to take an XML document and turn it into a 
DOM or SAX events, including whitespace normalization, expanding entities, 
transcoding, etc.  You could write your own implementation of the Xerces-C 
DOM abstract DOM, or Xalan-C's abstract DOM, but that's a lot of work. 
And, in the end, if you XPath expressions refer to the entire tree, you 
will end up loading the entire tree anyway.

> Is this possible?  Are there any suggestions about how I should go about 

> doing this?

Yes, but plan on doing _lots_ of work.  Adding several gigabytes of memory 
to your machine will be far cheaper.

There are some undocumented options with Xalan-C's source tree that will 
pool the strings for the text nodes in a document.  If your documents have 
text nodes with lots of repeated values, this can yield _significant_ 
memory savings.  You might want to build a custom version of the library 
with that option enabled to see if that helps memory consumption.

Dave

Reply via email to