This is the DTM (Document Table Model) representation of the input document tree. It's a huge improvement over DOM-style one-object-per-node, since Java objects take something on the order of 32 bytes just for bookkeeping before you add any data fields to them. But, yes, maintaining that tree (plus the fact that Java uses UTF16 internally, so each character of text content takes two bytes) does add up.
Before Xalan -- back when it was IBM's LotusXSL processor -- we had an ultra-compact variant of DTM which reduced node size down to just 16 bytes. However, that version imposed some serious performance penalties. The current DTM is a compromise between memory usage and access speed. In general, XSLT can access any portion of the input document at any time, and has the concept of "node identity" which must be maintained across those accesses, and thus a full in-memory model of the document is required. There are some potential opportunities for reducing that, at least for a subset of XSLT; there was some discussion in the archives of this mailing list of a few possible approaches. ______________________________________ "... Three things see no end: A loop with exit code done wrong, A semaphore untested, And the change that comes along. ..." -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish ( http://www.ovff.org/pegasus/songs/threes-rev-11.html) From: Toadie <toadie...@gmail.com> To: xalan-j-users@xml.apache.org Date: 04/03/2010 10:44 PM Subject: Re: approximation of memory footprint used by Xalan I did a bit more profiling and found that the majority of the mem allocation is in org.apache.xml.utils.SuballocatedIntVector called by SAX2DTM in the startElement method. The major of the mem allocation inside SuballocatedIntVector is in a pair of int[][] m_map and int[][] m_map0 The profiler showed that - 178,135 instances of int array were allocated and used up 458Mb - 56,009 instances of char[] were allocated and used up 106Mb it seems that for each element/node that is read and output by the SAX2DTM class, it add 1 integer into at least 6-8 instances of the SuballocatedIntVector object "m_firstch" "m_nextsib" "m_parent" "m_exptype" "m_dataOrQName" "m_prevsib" m_data m_value wow -- that was a surprise there. My Xml input file has a lilttle over 12.8 million xml element . A quick calculation (rough) show 12,800,000 * 32 bytes / 1024 / 1024 ~= 390 Mb. I wonder if there is an opportunity to tune/tweak the memory mgmt in that class or not or whether or not the array has to be kept from start-to-end of the input file for traversal purposes. Thanks in advance On Sat, Apr 3, 2010 at 11:25 AM, Toadie <toadie...@gmail.com> wrote: > Is there a way to approximate the memory footprint needed by Xalan to > run an XSL? > > For example, i am seeing that with SAX based transformation > - using Xalan 2.7.0 and java 1.6_u13 with a bootclasspath option to > force the JDK to load Xalan 2.7.0 > - an input file of size 180 meg > - and a simple XSL that does identity transformation (see below) > > The required memory footprint for heapsize is approximately 950Mb. -- > my questions are: > > 1. is there a way to approximate the required memory footprint? > 2. with SAX based processing, why does the 180Mb input file require > such high overhead of heap memory? > > _____ XSL ____ > > <?xml version='1.0' encoding='UTF-8'?> > <xsl:transform xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'> > > <xsl:template match="/"> > <xsl:apply-templates select="*"/> > </xsl:template> > > <xsl:template match="*"> > <xsl:copy> > <xsl:apply-templates select="@* | node()"/> > </xsl:copy> > </xsl:template> > > <xsl:template match="@*"> > <xsl:copy> > <xsl:apply-templates/> > </xsl:copy> > </xsl:template> > > </xsl:transform> >