I'm working on a prototype. It's nowhere near ready to check in for review;
I need to write some more code, debug, and run performance checks. First
version will be invoked explicitly via an extension function; IF everything
works out well we can start looking at trying to prune automatically based
on stylesheet analysis.
Switching to DTM did make pruning significantly harder, especially since we
really want a variety of pruning that might save us from DTM's current
limit on the maximum number of nodes... which means reusing node numbers of
pruned nodes, which means either compressing the tables as part of pruning
or treating them more as a classic heap rather than as document-ordered
vectors.
I'm trying the compression approach, with compression occurring in
multiples of the SuballocatedIntVector chunk size to minimize the amount of
recopying involved. Prune won't be a cheap operation, but I'm hoping the
savings in reduced swapping will more than make up for that.
As soon as I've got something that's complete enough to try out, I'll check
it in -- possibly as a branch, so we can discard it more easily if you
folks decide you hate this solution.