Possible improvements in XSLTMediator

Andreas Veithen Sat, 22 Dec 2007 14:00:14 -0800

Hi all!

I had a look at the code in XSLTMediator and I think there are someareas where improvements can be made. I would like to have youropinion on that. The areas I have identified are the following:

1) The mediator has a mechanism to stream the result of thetransformation to a temporary file when its size exceeds somethreshold (defined by BYTE_ARRAY_SIZE, which is 8192). However, whenthis happens, the transformation is actually triggered twice. Duringthe first run, the result is sent to a FixedByteArrayOutputStreamwhich will raise a SynapseException after the first 8192 bytes havebeen written. The transformation is then restarted using aFileOutputStream for the result. This is very bad for two reasons:

* It introduced an overhead (the first attempt to execute thetransformation) for every large XML that is processed.* It makes the choice of the threshold BYTE_ARRAY_SIZE difficult: asmall value is bad for smaller XML documents (because temporary filesare used where this is not necessary) but good for larger ones(because it reduces the overhead caused by the first transformationattempt); a large value is good for smaller XML documents (avoidstemporary files) but bad for larger ones (increased overhead for thefirst transformation).

A better approach would be to have an OutputStream implementation thatwill first write to a byte array and once the threshold is exceededtransparently switches to a temporary file, so that the transformationis always run only once. I used this pattern in another project (wherethe problem was to package and explode large archive files on the fly)and it works quite well.

2) I fear that we might have a problem on Windows platforms withtemporary files not being deleted unless Synapse is shut down (Ididn't test this yet, so correct me if my argument is incorrect).Indeed, after the transformation, the content of the temporary file isread back using the following instructions:

StAXOMBuilder builder = new StAXOMBuilder(newFileInputStream(tempTargetFile));

result = builder.getDocumentElement();

Since Axiom constructs the XML tree on demand, this will actually notread the entire file but only a small part of it (at least that's whatI understood from how Axiom works). Immediately after this,XSLTMediator executes the following piece of code:


boolean deleted = tempTargetFile.delete();
if (!deleted) {
    tempTargetFile.deleteOnExit();
}

Since the file is still open at that moment, on Windows platforms, thedelete operation will fail. Probably that's why the call todeleteOnExit has been added to the code. It follows that the file willnot be deleted until Synapse is shut down or restarted.

Note that on Unix systems the situation is different: the deleteoperation will remove the directory entry, but not the inode (sincethe file is still open). The inode will then be deleted by the OS whenthe file is closed.


BTW: Who actually closes the FileInputStream?

3) While writing the result of the transformation to a temporary fileindeed eliminates the need to keep the entire output in memory, thesituation is a bit different for the input document. Indeed, duringthe transformation the Axiom tree will be built and kept in memoryanyway. On the other hand, the XSLT processor also requires access tothe complete tree of the input document (except for XSLT processorsthat supports streaming, which to my knowledge is not the case forXalan). Xalan uses its own object model called DTM (Document TableModel) to store the input document in memory.

Since the input document must be kept in memory anyway, the onlyquestion is how to efficiently feed the original Axiom tree into theXSLT processor. Currently XSLTMediator uses two different strategies:

* If useDOMSourceAndResults is set to false, the Axiom tree will beserialized to a byte stream (in memory or to a temporary file) andthen fed into the XSLT processor using a StreamSource object. Xalanwill then parse the byte stream and create a DTM representation.* If useDOMSourceAndResults is set to true, the code will callElementHelper.importOMElement to get a DOM compliant version of theinput tree. From the code of this method it can be seen that anentirely new copy of the input tree will be created. The resulting DOMtree is then passed to the XSLT processor. Xalan will create a DTMrepresentation of this tree. This representation is not a completecopy of the DOM tree, but some sort of wrapper/adapter that is backedby the original DOM tree (at least that's my interpretation of thedocument here: http://xml.apache.org/xalan-j/dtm.html).

Both strategies are far from optimal. There are at least twostrategies that should give better results (with at least one of thembeing actually simpler):

* Trick Axis2 into producing a DOM compatible tree from the outset, byusing a StAXSOAPModelBuilder with a DOMSOAPFactory (my understandingis that this produces objects that implement both the Axiom and DOMinterfaces). This however might be tricky and require some tweaking.The advantage is that there is no need to create a copy anymore.* Make sure that a DTM representation is created directly from theAxiom tree without intermediate copy (byte stream or DOM tree). WithJava 6/JAXP 1.4 this will be very easy because it has support forStAXSource, which integrates nicely with Axiom. In the meantime, thesolution is to pull StAX events from Axiom and push them as SAX eventsto the XSLT processor. The Spring WS project has a utility classStaxSource (extending SAXSource) that does this in a completelytransparent way (new StaxSource(omElement.getXMLStreamReader())). Byusing getXMLStreamReaderWithoutCaching instead of getXMLStreamReader,this could probably be further optimized to instruct Axiom not tocreate the tree for the part of the input message that is beingtransformed (unless it has already been constructed at that moment).



Regards,

Andreas



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Possible improvements in XSLTMediator

Reply via email to