Hi Mikael ---
 
First, make sure you ARE using SAX... DOM is all in-memory... I'm not
sure about Stream, but I would imagine it defaults to constructing a DOM
(or can very easily).  You already know that DOM-based Documents have a
memory footprint proportional to the document size.
 
Second, look for XSL instructions that cause the transformer to hang
onto previously parsed XML events.  These kinds of instructions are
<xsl:sort>, <xsl:for-each>, and most numeric grouping functions like
sum(), count(), average().   These instructions should be okay as long
as they only apply to small subtrees of your XML document, but applied
to many nodes, they will cause your process size to grow (too big, in
the case of your big input files).  Keep your templates small and
to-the-point.
 
In my case, I had a sum() function try to add up everything in my
document, and it was just too big to handle in memory.  I ended up
computing the total amount in Java and providing the total in the XML
stream itself.  I really would have preferred to have the stylesheet
compute the total, but I had the same issue with large files and had to
find another way.
 
 
 
tlj
 
Timothy  Jones
Syniverse Technologies
Tampa, Florida, USA
+1 (813) 637-5366

 

________________________________

From: Mikael Jansson [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 20, 2008 3:32 PM
To: xalan-j-users@xml.apache.org
Subject: Transforming huge XML-files - 3-4GB


Hi! I've already posted this, but it does not show up in the history and
there was no response so I'll post it again. Sorry if it's a double
post...

I'm trying to transform a set of XML-files into SQL-code. It works just
fine with XSL no mater what components I use - Stream or SAX or what
ever.

But, when the files gets to big, I run out of memory. OutOfMemoryError -
java heap space.

If I use the incremental feature I can transform documents of about
200MB, whit out it a few MB.

Is there any way I can resolve this? I need to transform XML-fils up to
4GB.

I have tried to expand the java heap with java -XmxXXXM but it's not
sufficient. There has to be a way for the parser to only process one
node at a time, deleting the old ones.

I'm using the latest version of xalan-java 2.7.1. <http://2.7.1./> 


-- 
//Mikael Jansson

Reply via email to