Hi,

This is to propose the merge of two major pieces of development items
(XSLTC_DTM and the new serializer) to the head branch. They have been under
development for a few months. We are confident that the work is now in a
ready state and we would like to propose the merge of the work to the head
branch so that the majority of Xalan users can benefit from these changes.

Both the XSLTC_DTM and the new serializer work were driven by the desire to
have a closer integration and better code reuse between Xalan-J
Interpretive and XSLTC. As many of you might know, Xalan-J Interpretive and
XSLTC currently do not share too much code. They have different underlying
models for the input xml document. Xalan-J Interpretive uses the so-called
Document Table Model (DTM) to model the input xml, while XSLTC uses its own
DOMImpl implementation. The serializers used by Xalan-J Interpretive and
XSLTC are also different. As the project moves along towards a more mature
state and heading up to the coming XSLT20 release, sharing more components
between Xalan-J Interpretive and XSLTC becomes increasingly important. This
will make the code more maintainable in the future. It also eliminates some
subtle behavioral differences between Xalan-J Interpretive and XSLTC.

The XSLTC_DTM  development work aims at using the same Document Table Model
(DTM) for both Xalan-J Interpretive and XSLTC.  As a result, we replaced
XSLTC's underlying document model (DOMImpl) with DTM and extend the
original DTM so that it can be used more efficiently with XSLTC.  Many of
the dom related classes in XSLTC are adapted to work with the new DTM
model. There are relatively fewer changes in most of the compiler related
classes. Because of the changes in the core component, the translets
generated by the new driver are not binary compatible with the translets
compiled by the current XSLTC. The users need to recompile the stylesheets
when they switch to the new driver. The current development snapshot is
maintained in the XSLTC_DTM branch.

The new serializer work aims at providing a common serialization mechanism
for both Xalan-J Interpretive and XSLTC. The common serializer code resides
in the org.apache.xml.serializer package. The new serializer is designed to
take the benefits of the old serialization mechanisms from both sides. By
using the common serializer, many of the output differences between Xalan-J
Interpretive and XSLTC will go away, and maintenance effort will be
reduced.

During the merge of the Xalan-J Interpretive and XSLTC serializers, an
effort was made to use the best performing features from each serializer.
The class hierarchy is similar with the one used by XSLTC.  At the highest
level the serializer classes split based on whether the output is a
"Stream" or a "SAX" handler.  After that both major branches split based on
the output type (XML, HTML or TEXT).  This allows each flavor of serializer
to do optimizations based on the output type (XML, HTML or TEXT) and
whether that output is going to a "Stream" or a "SAX" handler.

>From a functional point of view, the new serializer's output matches the
behavior of Xalan-J Interpretive's old serializer, such as the choice of
which HTML entities to write out, and how to escape attribute values in
documents written to a stream. The various configuration files used by the
old Xalan serializer can be used unchanged in the new common serializer.

In addition to these design benefits, the integration of XSLTC_DTM and new
serializer also provides great benefits on both the conformance and
performance sides.

On the conformance side, the new driver has much better conformance results
than the current XSLTC because of the following:

1. The id function now works correctly with DOM input. Many of the idkey
testcases that failed with the current XSLTC now passed in the new driver.
2. Because of the new serialization mechanism, many testcases that failed
in trax.sax and trax.dom because of output differences now passed in the
new driver.
3. Numerous bug fixes were put in to fix conformance problems in other
areas.

Here is a comparison of  the conformance results between the new driver and
XSLTC from the current head branch:

trax.stream:
New driver:       Pass: 21/1650, Fail: 9/16, Errr: 5/12
Current XSLTC:  Pass: 19/1648, Fail: 11/25, Errr: 5/9

trax.sax:
New driver: Pass: 21/1650, Fail: 9/16, Errr: 5/12
Current XSLTC:  Pass: 18/1613, Fail: 12/60, Errr: 5/9

trax.dom:
New driver:  Pass: 21/1648, Fail: 9/17, Errr: 5/12
Current XSLTC:   Pass: 14/1589, Fail: 16/84, Errr: 5/9

As you can see, there are far fewer failures/errors in trax.sax and
trax.dom flavors with the new driver.

Since these are relatively big changes, many of you would like to know the
performance impact of the changes.  The good thing is that performance is
always an important focal point during the whole development. We have been
measuring the performance impact of all changes using both the Sun and IBM
JDKs. We also have performance runs on dedicated machines with hundred of
testcases for better coverage. We also run performance comparisons under
different setup conditions. Over the last couple of months, numerous
performance improvement changes were put in to the DTM and serializer to
make them work more efficiently. The performance numbers can be different
depending on the JDK version, the warm-up time and the JVM heap size.
Depending on the test configuration, the average performance numbers are
ranging from almost identical in both the new and the old driver to nearly
40% faster in the new driver. In practical environment, the performance
improvement for most stylesheets in XSLTC is likely to be within the range
of 10-20%. Stylesheets that use a lot of small result tree fragments can be
a few times faster because of the new light-weight RTF model.

Although the performance improvement work mostly focused on the XSLTC side,
many improvements also benefit Xalan-J Interpretive as well. As a result,
Xalan-J Interpretive from the new driver is roughly 5-10% faster than
Xalan-J Interpretive from the current head branch.

Credits:

Henry Zongaro & Morris Kwan:  Leading developers in DTM integration and
performance improvement
Brian Minchau:  Leading developer in the common serializer design and
implementation
Igor Hersht:  XSLTC conformance bug fixes and performance run
Myriam Midy:  Initial infrastructure work on XSLTC_DTM

Comments or concerns are welcomed.

Regards,

Morris Kwan
XSLT Development
IBM Toronto Lab
Tel: (905)413-3729
Email: [EMAIL PROTECTED]

Reply via email to