There isn't a single simple answer to that question. If there was, we'd already have dealt with it. Work is continuing on this.
First thing you should try is driving Xalan via SAX rather than reading a DOM, if you haven't already done so; part of the tradeoff that we accepted in DTM was improving the performance when building our own model at the cost of losig some performance when we read from an existing DOM. Note too that XSLTC is still an almost completely separate implementation. We're currently working on integrating it with the main Xalan codebase, but if you're reporting performance/size issues please be sure to report whether you used interpretive Xalan or compiled XSLTC, since the issues are likely to be significantly different.
