Anton Khodakivskiy wrote:


---------- Forwarded message ----------
From: *Anton Khodakivskiy* <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
Date: Dec 18, 2007 8:39 AM
Subject: slt transforming large XML files 1gig+
To: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>


Hello

I'm looking for a generic way to transform large XML files - possibly 1 gig and more. As you understand my biggest concerns are memory usage and performance. I just h ave tried the command line tool Xalan.exe and it quite looks like it loads the whole xml - I'm not sure what for, but I expect that it parses the xml in the framework of DOM. Is it possible to use a SAX based xml parser for the XSLT transformation in xalan, or something like this?
Xalan-C doesn't use the DOM per-se, although it does use a tree representation of the input XML. The differences are primarily related to reducing memory usage by implementing a read-only tree, which is all that's necessary for XSLT processing.

Because the XPath language provides random access to the source tree, most XSLT processors use an in-memory representation, rather than trying to do streaming processing. If you can reduce your transformation to a streaming subset of XPath, you might try STX:

http://www.xml.com/pub/a/2003/02/26/stx.html


Also I have read that "it's not recommended to use XSLT on big XML files" - haven't found a meaningful explanation though. What do you think about it? Are there any other alternative ways for generic xml transformations which sattisfy my needs (big xmls)?

I think you'll find that Xalan-C's memory footprint for a 1GB XML document will be much less than 1GB of memory, although it can vary widely depending on the document. In addition, for documents that have a lot of repeated text nodes, you can enable pooling of text nodes to further reduce the memory footprint of the source tree.

Whether something's "recommended" or not depends on your requirements. A blanket statement like that doesn't reflect every possible set of requirements in the real world.

Dave

Reply via email to