Anton Khodakivskiy wrote:
---------- Forwarded message ----------
From: *Anton Khodakivskiy* <[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>>
Date: Dec 18, 2007 8:39 AM
Subject: slt transforming large XML files 1gig+
To: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
Hello
I'm looking for a generic way to transform large XML files - possibly 1
gig and more. As you understand my biggest concerns are memory usage and
performance.
I just h ave tried the command line tool Xalan.exe and it quite looks
like it loads the whole xml - I'm not sure what for, but I expect that
it parses the xml in the framework of DOM. Is it possible to use a SAX
based xml parser for the XSLT transformation in xalan, or something like
this?
Xalan-C doesn't use the DOM per-se, although it does use a tree
representation of the input XML. The differences are primarily related to
reducing memory usage by implementing a read-only tree, which is all that's
necessary for XSLT processing.
Because the XPath language provides random access to the source tree, most
XSLT processors use an in-memory representation, rather than trying to do
streaming processing. If you can reduce your transformation to a streaming
subset of XPath, you might try STX:
http://www.xml.com/pub/a/2003/02/26/stx.html
Also I have read that "it's not recommended to use XSLT on big XML
files" - haven't found a meaningful explanation though. What do you
think about it? Are there any other alternative ways for generic xml
transformations which sattisfy my needs (big xmls)?
I think you'll find that Xalan-C's memory footprint for a 1GB XML document
will be much less than 1GB of memory, although it can vary widely depending
on the document. In addition, for documents that have a lot of repeated
text nodes, you can enable pooling of text nodes to further reduce the
memory footprint of the source tree.
Whether something's "recommended" or not depends on your requirements. A
blanket statement like that doesn't reflect every possible set of
requirements in the real world.
Dave