I've been doing some performance testing of Xalan-Java and Xalan-C++ for processing files that range from a few hundred Kbytes to a few hundred Mbytes. For the tests, I used Xalan-J 2.5.1 with JDK 1.4.2_01 and Xalan-C++ 1.5 on a Dual Pentium III PC with 1 GByte of memory running Windows 2K Professional.
I'm a bit surprised with the results as Xalan-C++ performance is linear with respect to the XML input size while Xalan-J performance is exponential.
To give a bit more context, the kind of transformations we're mostly interested are flattening XML into relational structures. The attached ZIP contains three stylesheets that extract data out of the input XML document at different nesting levels and a few sample documents along with an Excel spreadsheet that details the tests results.
The structure of the input documents looks like:
<?xml version="1.0"?>
<customers>
<customer id="0" name="Acme, Inc.">
<orders>
<order order_no="0">
<items>
<item item_no="12" quantity="260" />
...
</items>
</order>
...
</orders>
<addresses>
<address street="645 Lake Blvd." city="Boston" state="MA" zip="01011" />
...
</addresses>
</customer>
</customers>
Some statistics:
- All documents contain 50 customer elements
- The count of order elements ranges from 1000 to 441439
- The count of item elements ranges from 2960 to 1323687
- The number of address elements is almost constant around 100 instances
and the three transformations extract:
- The addresses of a customer
- The orders of a customer
- The items of an order
In all three tests (Xalan-Java, XSLTC and Xalan-C++) I'm sending to output to the std out and redirecting the results to a file.
I tested using both the interpreted version of the XSLT processor and XSLTC and the results are very similar although XSLTC performs a little better as the size of the input increases. As far as java is concerned, I had to increase the maximum java heap size to 1 GByte (-Xmx option). I also played a little with the initial heap size (-Xms option) and got some improvement but as the size of input file approached the upper end of the tests performance degraded dramatically (the results are included in the attached spreadsheet). One interesting detail I got using the -Xprof profiling option of java is that the java.util.Vector.ensureCapacityHelper method seems to be taking most of the execution time (anywhere from 40 to 87% as the size of the file increases).
I'm interested in getting comments from other people about their experience with performance. Is this behavior typical of the kind of transformation I'm doing?
Additionally, I had a problem using the translet that extracts all item elements. Starting with a document that contains 296380 item elements the transformation aborted with a "Translet errors:No more DTM IDs are available" error. I looked through the FAQ and mailing lists and didn't find anything about this apart from an issue that existed in previous versions of Xalan-J that is no longer present in version 2.5.1.
My environment is as follows:
#---- BEGIN writeEnvironmentReport($Revision: 1.20 $): Useful stuff found: ----
version.DOM.draftlevel=2.0fd
java.class.path=d:/xalan-j_2_5_1/bin/xalan.jar;d:/xalan-j_2_5_1/bin/xml-apis.jar;d:/xalan-j_2_5_1/bin/xercesImpl.jar;.;d:/j2sdk1.4.2_01/lib;d:/j2sdk1.4.2_01/jre/lib
version.JAXP=1.1 or higher
java.ext.dirs=d:\j2sdk1.4.2_01\jre\lib\ext
#---- BEGIN Listing XML-related jars in: foundclasses.sun.boot.class.path ----
xalan.jar-path=d:\j2sdk1.4.2_01\jre\lib\endorsed\xalan.jar
xercesImpl.jar-apparent.version=xercesImpl.jar from xalan-j_2_5_0 from xerces-2_4
xercesImpl.jar-path=d:\j2sdk1.4.2_01\jre\lib\endorsed\xercesImpl.jar
xml-apis.jar-apparent.version=xml-apis.jar present-unknown-version
xml-apis.jar-path=d:\j2sdk1.4.2_01\jre\lib\endorsed\xml-apis.jar
#----- END Listing XML-related jars in: foundclasses.sun.boot.class.path -----
version.xerces2=Xerces-J 2.4.0
version.xerces1=not-present
version.xalan2_2=Xalan Java 2.5.1
version.xalan1=not-present
version.ant=not-present
java.version=1.4.2_01
version.DOM=2.0
version.crimson=present-unknown-version
sun.boot.class.path=d:\j2sdk1.4.2_01\jre\lib\endorsed\xalan.jar;d:\j2sdk1.4.2_01\jre\lib\endorsed\xercesImpl.jar;d:\j2sdk1.4.2_01\jre\lib\endorsed\xml-apis.jar;d:\j2sdk1.4.2_01\jre\lib\rt.jar;d:\j2sdk1.4.2_01\jre\lib\i18n.jar;d:\j2sdk1.4.2_01\jre\lib\sunrsasign.jar;d:\j2sdk1.4.2_01\jre\lib\jsse.jar;d:\j2sdk1.4.2_01\jre\lib\jce.jar;d:\j2sdk1.4.2_01\jre\lib\charsets.jar;d:\j2sdk1.4.2_01\jre\classes
#---- BEGIN Listing XML-related jars in: foundclasses.java.class.path ----
xalan.jar-path=d:\xalan-j_2_5_1\bin\xalan.jar
xml-apis.jar-apparent.version=xml-apis.jar present-unknown-version
xml-apis.jar-path=d:\xalan-j_2_5_1\bin\xml-apis.jar
xercesImpl.jar-apparent.version=xercesImpl.jar from xalan-j_2_5_0 from xerces-2_4
xercesImpl.jar-path=d:\xalan-j_2_5_1\bin\xercesImpl.jar
#----- END Listing XML-related jars in: foundclasses.java.class.path -----
version.SAX=2.0
version.xalan2x=Xalan Java 2.5.1
#----- END writeEnvironmentReport: Useful properties found: -----
# YAHOO! Your environment seems to be OK.
Thanks,
Hernando Borda
Software Developer
Ascential Software Corp.
<<perf.ZIP>>
<<attachment: perf.ZIP>>
