Hi Hernando,
Two things before we start here:
1. Cross-posting is generally a bad idea, although in this case it may
be justified.
2. Never, never, ever include a 3MB attachment to a posting, much less a
cross-posting. That is incredibly abusive of the list, since you've now
replicated a 3MB attachment to who knows how many people. Not everyone
has your bandwidth, not to mention your storage capacity. This is basic
netiquette. I'm lucky enough to be subscribed to both lists, so I had
to download 6MB of data. That did not make me happy.
Now, to the problem. You are probably seeing exponential increases in
transformation times with the Java processors because of memory usage
issues. Xalan-C is much more efficient at managing memory than most Java
processors, so it scales much better with the large files you are
processing, since it uses less memory to store the in-memory representation
of the document. Once your documents get large enough that physical memory
is exceeded, you will likely see the same sort of behavior with Xalan-C.
Dave
|---------+------------------------------------>
| | [EMAIL PROTECTED]|
| | oftware.com |
| | |
| | 09/08/2003 06:55 AM |
|---------+------------------------------------>
>----------------------------------------------------------------------------------------------------------|
|
|
| To: [EMAIL PROTECTED], [email protected]
|
| cc: (bcc: David N Bertoni/Cambridge/IBM)
|
| Subject: Performace questions and possible bug
|
>----------------------------------------------------------------------------------------------------------|
I've been doing some performance testing of Xalan-Java and Xalan-C++ for
processing files that range from a few hundred Kbytes to a few hundred
Mbytes. For the tests, I used Xalan-J 2.5.1 with JDK 1.4.2_01 and Xalan-C++
1.5 on a Dual Pentium III PC with 1 GByte of memory running Windows 2K
Professional.
I'm a bit surprised with the results as Xalan-C++ performance is linear
with respect to the XML input size while Xalan-J performance is
exponential.
To give a bit more context, the kind of transformations we're mostly
interested are flattening XML into relational structures. The attached ZIP
contains three stylesheets that extract data out of the input XML document
at different nesting levels and a few sample documents along with an Excel
spreadsheet that details the tests results.
The structure of the input documents looks like:
<?xml version="1.0"?>
<customers>
<customer id="0" name="Acme, Inc.">
<orders>
<order order_no="0">
<items>
<item item_no="12" quantity="260" />
...
</items>
</order>
...
</orders>
<addresses>
<address street="645 Lake Blvd." city="Boston" state="MA" zip="01011"
/>
...
</addresses>
</customer>
</customers>
Some statistics:
- All documents contain 50 customer elements
- The count of order elements ranges from 1000 to 441439
- The count of item elements ranges from 2960 to 1323687
- The number of address elements is almost constant around 100
instances
and the three transformations extract:
- The addresses of a customer
- The orders of a customer
- The items of an order
In all three tests (Xalan-Java, XSLTC and Xalan-C++) I'm sending to output
to the std out and redirecting the results to a file.
I tested using both the interpreted version of the XSLT processor and XSLTC
and the results are very similar although XSLTC performs a little better as
the size of the input increases. As far as java is concerned, I had to
increase the maximum java heap size to 1 GByte (-Xmx option). I also played
a little with the initial heap size (-Xms option) and got some improvement
but as the size of input file approached the upper end of the tests
performance degraded dramatically (the results are included in the attached
spreadsheet). One interesting detail I got using the -Xprof profiling
option of java is that the java.util.Vector.ensureCapacityHelper method
seems to be taking most of the execution time (anywhere from 40 to 87% as
the size of the file increases).
I'm interested in getting comments from other people about their experience
with performance. Is this behavior typical of the kind of transformation
I'm doing?
Additionally, I had a problem using the translet that extracts all item
elements. Starting with a document that contains 296380 item elements the
transformation aborted with a "Translet errors:No more DTM IDs are
available" error. I looked through the FAQ and mailing lists and didn't
find anything about this apart from an issue that existed in previous
versions of Xalan-J that is no longer present in version 2.5.1.
My environment is as follows:
#---- BEGIN writeEnvironmentReport($Revision: 1.20 $): Useful stuff found:
----
version.DOM.draftlevel=2.0fd
java.class.path=d:/xalan-j_2_5_1/bin/xalan.jar;d:/xalan-j_2_5_1/bin/xml-apis.jar;d:/xalan-j_2_5_1/bin/xercesImpl.jar;.;d:/j2sdk1.4.2
_01/lib;d:/j2sdk1.4.2_01/jre/lib
version.JAXP=1.1 or higher
java.ext.dirs=d:\j2sdk1.4.2_01\jre\lib\ext
#---- BEGIN Listing XML-related jars in: foundclasses.sun.boot.class.path
----
xalan.jar-path=d:\j2sdk1.4.2_01\jre\lib\endorsed\xalan.jar
xercesImpl.jar-apparent.version=xercesImpl.jar from xalan-j_2_5_0 from
xerces-2_4
xercesImpl.jar-path=d:\j2sdk1.4.2_01\jre\lib\endorsed\xercesImpl.jar
xml-apis.jar-apparent.version=xml-apis.jar present-unknown-version
xml-apis.jar-path=d:\j2sdk1.4.2_01\jre\lib\endorsed\xml-apis.jar
#----- END Listing XML-related jars in: foundclasses.sun.boot.class.path
-----
version.xerces2=Xerces-J 2.4.0
version.xerces1=not-present
version.xalan2_2=Xalan Java 2.5.1
version.xalan1=not-present
version.ant=not-present
java.version=1.4.2_01
version.DOM=2.0
version.crimson=present-unknown-version
sun.boot.class.path=d:\j2sdk1.4.2
_01\jre\lib\endorsed\xalan.jar;d:\j2sdk1.4.2
_01\jre\lib\endorsed\xercesImpl.jar;d:\j2sdk1.4.2
_01\jre\lib\endorsed\xml-apis.jar;d:\j2sdk1.4.2
_01\jre\lib\rt.jar;d:\j2sdk1.4.2_01\jre\lib\i18n.jar;d:\j2sdk1.4.2
_01\jre\lib\sunrsasign.jar;d:\j2sdk1.4.2_01\jre\lib\jsse.jar;d:\j2sdk1.4.2
_01\jre\lib\jce.jar;d:\j2sdk1.4.2_01\jre\lib\charsets.jar;d:\j2sdk1.4.2
_01\jre\classes
#---- BEGIN Listing XML-related jars in: foundclasses.java.class.path ----
xalan.jar-path=d:\xalan-j_2_5_1\bin\xalan.jar
xml-apis.jar-apparent.version=xml-apis.jar present-unknown-version
xml-apis.jar-path=d:\xalan-j_2_5_1\bin\xml-apis.jar
xercesImpl.jar-apparent.version=xercesImpl.jar from xalan-j_2_5_0 from
xerces-2_4
xercesImpl.jar-path=d:\xalan-j_2_5_1\bin\xercesImpl.jar
#----- END Listing XML-related jars in: foundclasses.java.class.path -----
version.SAX=2.0
version.xalan2x=Xalan Java 2.5.1
#----- END writeEnvironmentReport: Useful properties found: -----
# YAHOO! Your environment seems to be OK.
Thanks,
Hernando Borda
Software Developer
Ascential Software Corp.