Re: approximation of memory footprint used by Xalan

keshlam Sun, 04 Apr 2010 11:10:28 -0700

This is the DTM (Document Table Model) representation of the input 
document tree. It's a huge improvement over DOM-style one-object-per-node, 
since Java objects take something on the order of 32 bytes just for 
bookkeeping before you add any data fields to them. But, yes, maintaining 
that tree (plus the fact that Java uses UTF16 internally, so each 
character of text content takes two bytes) does add up.

Before Xalan -- back when it was IBM's LotusXSL processor -- we had an 
ultra-compact variant of DTM which reduced node size down to just 16 
bytes. However, that version imposed some serious performance penalties. 
The current DTM is a compromise between memory usage and access speed.

In general, XSLT can access any portion of the input document at any time, 
and has the concept of "node identity" which must be maintained across 
those accesses, and thus a full in-memory model of the document is 
required. There are some potential opportunities for reducing that, at 
least for a subset of XSLT; there was some discussion in the archives of 
this mailing list of a few possible approaches.

______________________________________
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
http://www.ovff.org/pegasus/songs/threes-rev-11.html)

From:
Toadie <toadie...@gmail.com>
To:
xalan-j-users@xml.apache.org
Date:
04/03/2010 10:44 PM
Subject:
Re: approximation of memory footprint used by Xalan

I did a bit more profiling and found that the majority of the mem
allocation is in org.apache.xml.utils.SuballocatedIntVector called by
SAX2DTM in the startElement method.  The major of the mem allocation
inside SuballocatedIntVector is in a pair of int[][] m_map and int[][]
m_map0

The profiler showed that
- 178,135 instances of int array were allocated and used up 458Mb
- 56,009 instances of char[] were allocated and used up 106Mb

it seems that for each element/node that is read and output by the
SAX2DTM class, it add 1 integer into at least 6-8 instances of the
SuballocatedIntVector object

"m_firstch"
"m_nextsib"
"m_parent"
"m_exptype"
"m_dataOrQName"
"m_prevsib"
m_data
m_value

wow -- that was a surprise there.  My Xml input file has a lilttle
over 12.8 million xml element .  A quick calculation (rough) show

12,800,000 * 32 bytes / 1024 / 1024 ~= 390 Mb.

I wonder if there is an opportunity to tune/tweak the memory mgmt in
that class or not or whether or not the array has to be kept from
start-to-end of the input file for traversal purposes.

Thanks in advance

On Sat, Apr 3, 2010 at 11:25 AM, Toadie <toadie...@gmail.com> wrote:
> Is there a way to approximate the memory footprint needed by Xalan to
> run an XSL?
>
> For example, i am seeing that with SAX based transformation
> - using Xalan 2.7.0 and java 1.6_u13 with a bootclasspath option to
> force the JDK to load Xalan 2.7.0
> - an input file of size 180 meg
> - and a simple XSL that does identity transformation (see below)
>
> The required memory footprint for heapsize is approximately 950Mb. --
> my questions are:
>
> 1. is there a way to approximate the required memory footprint?
> 2. with SAX based processing, why does the 180Mb input file require
> such high overhead of heap memory?
>
> _____ XSL ____
>
> <?xml version='1.0' encoding='UTF-8'?>
> <xsl:transform  xmlns:xsl='http://www.w3.org/1999/XSL/Transform' 
version='1.0'>
>
>    <xsl:template match="/">
>        <xsl:apply-templates select="*"/>
>    </xsl:template>
>
>    <xsl:template match="*">
>        <xsl:copy>
>            <xsl:apply-templates select="@* | node()"/>
>        </xsl:copy>
>    </xsl:template>
>
>    <xsl:template match="@*">
>        <xsl:copy>
>            <xsl:apply-templates/>
>        </xsl:copy>
>    </xsl:template>
>
> </xsl:transform>
>

Re: approximation of memory footprint used by Xalan

Reply via email to