Hi, This is to propose the merge of two major pieces of development items (XSLTC_DTM and the new serializer) to the head branch. They have been under development for a few months. We are confident that the work is now in a ready state and we would like to propose the merge of the work to the head branch so that the majority of Xalan users can benefit from these changes.
Both the XSLTC_DTM and the new serializer work were driven by the desire to have a closer integration and better code reuse between Xalan-J Interpretive and XSLTC. As many of you might know, Xalan-J Interpretive and XSLTC currently do not share too much code. They have different underlying models for the input xml document. Xalan-J Interpretive uses the so-called Document Table Model (DTM) to model the input xml, while XSLTC uses its own DOMImpl implementation. The serializers used by Xalan-J Interpretive and XSLTC are also different. As the project moves along towards a more mature state and heading up to the coming XSLT20 release, sharing more components between Xalan-J Interpretive and XSLTC becomes increasingly important. This will make the code more maintainable in the future. It also eliminates some subtle behavioral differences between Xalan-J Interpretive and XSLTC. The XSLTC_DTM development work aims at using the same Document Table Model (DTM) for both Xalan-J Interpretive and XSLTC. As a result, we replaced XSLTC's underlying document model (DOMImpl) with DTM and extend the original DTM so that it can be used more efficiently with XSLTC. Many of the dom related classes in XSLTC are adapted to work with the new DTM model. There are relatively fewer changes in most of the compiler related classes. Because of the changes in the core component, the translets generated by the new driver are not binary compatible with the translets compiled by the current XSLTC. The users need to recompile the stylesheets when they switch to the new driver. The current development snapshot is maintained in the XSLTC_DTM branch. The new serializer work aims at providing a common serialization mechanism for both Xalan-J Interpretive and XSLTC. The common serializer code resides in the org.apache.xml.serializer package. The new serializer is designed to take the benefits of the old serialization mechanisms from both sides. By using the common serializer, many of the output differences between Xalan-J Interpretive and XSLTC will go away, and maintenance effort will be reduced. During the merge of the Xalan-J Interpretive and XSLTC serializers, an effort was made to use the best performing features from each serializer. The class hierarchy is similar with the one used by XSLTC. At the highest level the serializer classes split based on whether the output is a "Stream" or a "SAX" handler. After that both major branches split based on the output type (XML, HTML or TEXT). This allows each flavor of serializer to do optimizations based on the output type (XML, HTML or TEXT) and whether that output is going to a "Stream" or a "SAX" handler. >From a functional point of view, the new serializer's output matches the behavior of Xalan-J Interpretive's old serializer, such as the choice of which HTML entities to write out, and how to escape attribute values in documents written to a stream. The various configuration files used by the old Xalan serializer can be used unchanged in the new common serializer. In addition to these design benefits, the integration of XSLTC_DTM and new serializer also provides great benefits on both the conformance and performance sides. On the conformance side, the new driver has much better conformance results than the current XSLTC because of the following: 1. The id function now works correctly with DOM input. Many of the idkey testcases that failed with the current XSLTC now passed in the new driver. 2. Because of the new serialization mechanism, many testcases that failed in trax.sax and trax.dom because of output differences now passed in the new driver. 3. Numerous bug fixes were put in to fix conformance problems in other areas. Here is a comparison of the conformance results between the new driver and XSLTC from the current head branch: trax.stream: New driver: Pass: 21/1650, Fail: 9/16, Errr: 5/12 Current XSLTC: Pass: 19/1648, Fail: 11/25, Errr: 5/9 trax.sax: New driver: Pass: 21/1650, Fail: 9/16, Errr: 5/12 Current XSLTC: Pass: 18/1613, Fail: 12/60, Errr: 5/9 trax.dom: New driver: Pass: 21/1648, Fail: 9/17, Errr: 5/12 Current XSLTC: Pass: 14/1589, Fail: 16/84, Errr: 5/9 As you can see, there are far fewer failures/errors in trax.sax and trax.dom flavors with the new driver. Since these are relatively big changes, many of you would like to know the performance impact of the changes. The good thing is that performance is always an important focal point during the whole development. We have been measuring the performance impact of all changes using both the Sun and IBM JDKs. We also have performance runs on dedicated machines with hundred of testcases for better coverage. We also run performance comparisons under different setup conditions. Over the last couple of months, numerous performance improvement changes were put in to the DTM and serializer to make them work more efficiently. The performance numbers can be different depending on the JDK version, the warm-up time and the JVM heap size. Depending on the test configuration, the average performance numbers are ranging from almost identical in both the new and the old driver to nearly 40% faster in the new driver. In practical environment, the performance improvement for most stylesheets in XSLTC is likely to be within the range of 10-20%. Stylesheets that use a lot of small result tree fragments can be a few times faster because of the new light-weight RTF model. Although the performance improvement work mostly focused on the XSLTC side, many improvements also benefit Xalan-J Interpretive as well. As a result, Xalan-J Interpretive from the new driver is roughly 5-10% faster than Xalan-J Interpretive from the current head branch. Credits: Henry Zongaro & Morris Kwan: Leading developers in DTM integration and performance improvement Brian Minchau: Leading developer in the common serializer design and implementation Igor Hersht: XSLTC conformance bug fixes and performance run Myriam Midy: Initial infrastructure work on XSLTC_DTM Comments or concerns are welcomed. Regards, Morris Kwan XSLT Development IBM Toronto Lab Tel: (905)413-3729 Email: [EMAIL PROTECTED]
