Hi all!

I had a look at the code in XSLTMediator and I think there are some areas where improvements can be made. I would like to have your opinion on that. The areas I have identified are the following:

1) The mediator has a mechanism to stream the result of the transformation to a temporary file when its size exceeds some threshold (defined by BYTE_ARRAY_SIZE, which is 8192). However, when this happens, the transformation is actually triggered twice. During the first run, the result is sent to a FixedByteArrayOutputStream which will raise a SynapseException after the first 8192 bytes have been written. The transformation is then restarted using a FileOutputStream for the result. This is very bad for two reasons:

* It introduced an overhead (the first attempt to execute the transformation) for every large XML that is processed. * It makes the choice of the threshold BYTE_ARRAY_SIZE difficult: a small value is bad for smaller XML documents (because temporary files are used where this is not necessary) but good for larger ones (because it reduces the overhead caused by the first transformation attempt); a large value is good for smaller XML documents (avoids temporary files) but bad for larger ones (increased overhead for the first transformation).

A better approach would be to have an OutputStream implementation that will first write to a byte array and once the threshold is exceeded transparently switches to a temporary file, so that the transformation is always run only once. I used this pattern in another project (where the problem was to package and explode large archive files on the fly) and it works quite well.

2) I fear that we might have a problem on Windows platforms with temporary files not being deleted unless Synapse is shut down (I didn't test this yet, so correct me if my argument is incorrect). Indeed, after the transformation, the content of the temporary file is read back using the following instructions:

StAXOMBuilder builder = new StAXOMBuilder(new FileInputStream(tempTargetFile));
result = builder.getDocumentElement();

Since Axiom constructs the XML tree on demand, this will actually not read the entire file but only a small part of it (at least that's what I understood from how Axiom works). Immediately after this, XSLTMediator executes the following piece of code:

boolean deleted = tempTargetFile.delete();
if (!deleted) {
    tempTargetFile.deleteOnExit();
}

Since the file is still open at that moment, on Windows platforms, the delete operation will fail. Probably that's why the call to deleteOnExit has been added to the code. It follows that the file will not be deleted until Synapse is shut down or restarted.

Note that on Unix systems the situation is different: the delete operation will remove the directory entry, but not the inode (since the file is still open). The inode will then be deleted by the OS when the file is closed.

BTW: Who actually closes the FileInputStream?

3) While writing the result of the transformation to a temporary file indeed eliminates the need to keep the entire output in memory, the situation is a bit different for the input document. Indeed, during the transformation the Axiom tree will be built and kept in memory anyway. On the other hand, the XSLT processor also requires access to the complete tree of the input document (except for XSLT processors that supports streaming, which to my knowledge is not the case for Xalan). Xalan uses its own object model called DTM (Document Table Model) to store the input document in memory.

Since the input document must be kept in memory anyway, the only question is how to efficiently feed the original Axiom tree into the XSLT processor. Currently XSLTMediator uses two different strategies:

* If useDOMSourceAndResults is set to false, the Axiom tree will be serialized to a byte stream (in memory or to a temporary file) and then fed into the XSLT processor using a StreamSource object. Xalan will then parse the byte stream and create a DTM representation. * If useDOMSourceAndResults is set to true, the code will call ElementHelper.importOMElement to get a DOM compliant version of the input tree. From the code of this method it can be seen that an entirely new copy of the input tree will be created. The resulting DOM tree is then passed to the XSLT processor. Xalan will create a DTM representation of this tree. This representation is not a complete copy of the DOM tree, but some sort of wrapper/adapter that is backed by the original DOM tree (at least that's my interpretation of the document here: http://xml.apache.org/xalan-j/dtm.html).

Both strategies are far from optimal. There are at least two strategies that should give better results (with at least one of them being actually simpler):

* Trick Axis2 into producing a DOM compatible tree from the outset, by using a StAXSOAPModelBuilder with a DOMSOAPFactory (my understanding is that this produces objects that implement both the Axiom and DOM interfaces). This however might be tricky and require some tweaking. The advantage is that there is no need to create a copy anymore. * Make sure that a DTM representation is created directly from the Axiom tree without intermediate copy (byte stream or DOM tree). With Java 6/JAXP 1.4 this will be very easy because it has support for StAXSource, which integrates nicely with Axiom. In the meantime, the solution is to pull StAX events from Axiom and push them as SAX events to the XSLT processor. The Spring WS project has a utility class StaxSource (extending SAXSource) that does this in a completely transparent way (new StaxSource(omElement.getXMLStreamReader())). By using getXMLStreamReaderWithoutCaching instead of getXMLStreamReader, this could probably be further optimized to instruct Axiom not to create the tree for the part of the input message that is being transformed (unless it has already been constructed at that moment).


Regards,

Andreas



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to