> I would like to adapt a library that I am using that uses XPath from > Xalan. At the moment the library reads very large XML files entirely in
> to memory. The files that I am using contains either a BLOB or a very > structured xml table as one of the elements. Large files will always consume a great deal of memory, even with the default source tree implementation, which is about as efficient as it can be. > I would like to prevent the data from part of the tree being loaded in to > memory. Is this possible and how is it done? Unfortunately, because XPath allows random access to the tree, this is a very difficult problem to solve. > I would also like to be able to obtain a pointer to the place in the file > where this tag starts, so I can use a non-DOM method to read this data. I'm not sure what you're looking for here. There's a lot the parser does to take an XML document and turn it into a DOM or SAX events, including whitespace normalization, expanding entities, transcoding, etc. You could write your own implementation of the Xerces-C DOM abstract DOM, or Xalan-C's abstract DOM, but that's a lot of work. And, in the end, if you XPath expressions refer to the entire tree, you will end up loading the entire tree anyway. > Is this possible? Are there any suggestions about how I should go about > doing this? Yes, but plan on doing _lots_ of work. Adding several gigabytes of memory to your machine will be far cheaper. There are some undocumented options with Xalan-C's source tree that will pool the strings for the text nodes in a document. If your documents have text nodes with lots of repeated values, this can yield _significant_ memory savings. You might want to build a custom version of the library with that option enabled to see if that helps memory consumption. Dave
