Thanks Dave that was helpful. Are there any other XSLT libraries which parse the xml as the stream and do not consume that much memory? I have read about SAXON-SA and they claim that the library supports up to 20 gig xmls.. I will test it shortly. Too bad it's commercial.. Also there is another commercial implementation from Intel which is supposed to handle large xmls...
I tried Joost STX library yesterday and it works pretty good, btw.. On Dec 18, 2007 9:09 AM, David Bertoni <[EMAIL PROTECTED]> wrote: > Anton Khodakivskiy wrote: > > > > > > ---------- Forwarded message ---------- > > From: *Anton Khodakivskiy* <[EMAIL PROTECTED] > > <mailto:[EMAIL PROTECTED]>> > > Date: Dec 18, 2007 8:39 AM > > Subject: slt transforming large XML files 1gig+ > > To: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > > > > > > Hello > > > > I'm looking for a generic way to transform large XML files - possibly 1 > > gig and more. As you understand my biggest concerns are memory usage and > > performance. > > I just h ave tried the command line tool Xalan.exe and it quite looks > > like it loads the whole xml - I'm not sure what for, but I expect that > > it parses the xml in the framework of DOM. Is it possible to use a SAX > > based xml parser for the XSLT transformation in xalan, or something like > > this? > Xalan-C doesn't use the DOM per-se, although it does use a tree > representation of the input XML. The differences are primarily related to > reducing memory usage by implementing a read-only tree, which is all > that's > necessary for XSLT processing. > > Because the XPath language provides random access to the source tree, most > XSLT processors use an in-memory representation, rather than trying to do > streaming processing. If you can reduce your transformation to a > streaming > subset of XPath, you might try STX: > > http://www.xml.com/pub/a/2003/02/26/stx.html > > > > > Also I have read that "it's not recommended to use XSLT on big XML > > files" - haven't found a meaningful explanation though. What do you > > think about it? Are there any other alternative ways for generic xml > > transformations which sattisfy my needs (big xmls)? > > I think you'll find that Xalan-C's memory footprint for a 1GB XML document > will be much less than 1GB of memory, although it can vary widely > depending > on the document. In addition, for documents that have a lot of repeated > text nodes, you can enable pooling of text nodes to further reduce the > memory footprint of the source tree. > > Whether something's "recommended" or not depends on your requirements. A > blanket statement like that doesn't reflect every possible set of > requirements in the real world. > > Dave >