Hi all!
I had a look at the code in XSLTMediator and I think there are some
areas where improvements can be made. I would like to have your
opinion on that. The areas I have identified are the following:
1) The mediator has a mechanism to stream the result of the
transformation to a temporary file when its size exceeds some
threshold (defined by BYTE_ARRAY_SIZE, which is 8192). However, when
this happens, the transformation is actually triggered twice. During
the first run, the result is sent to a FixedByteArrayOutputStream
which will raise a SynapseException after the first 8192 bytes have
been written. The transformation is then restarted using a
FileOutputStream for the result. This is very bad for two reasons:
* It introduced an overhead (the first attempt to execute the
transformation) for every large XML that is processed.
* It makes the choice of the threshold BYTE_ARRAY_SIZE difficult: a
small value is bad for smaller XML documents (because temporary files
are used where this is not necessary) but good for larger ones
(because it reduces the overhead caused by the first transformation
attempt); a large value is good for smaller XML documents (avoids
temporary files) but bad for larger ones (increased overhead for the
first transformation).
A better approach would be to have an OutputStream implementation that
will first write to a byte array and once the threshold is exceeded
transparently switches to a temporary file, so that the transformation
is always run only once. I used this pattern in another project (where
the problem was to package and explode large archive files on the fly)
and it works quite well.
2) I fear that we might have a problem on Windows platforms with
temporary files not being deleted unless Synapse is shut down (I
didn't test this yet, so correct me if my argument is incorrect).
Indeed, after the transformation, the content of the temporary file is
read back using the following instructions:
StAXOMBuilder builder = new StAXOMBuilder(new
FileInputStream(tempTargetFile));
result = builder.getDocumentElement();
Since Axiom constructs the XML tree on demand, this will actually not
read the entire file but only a small part of it (at least that's what
I understood from how Axiom works). Immediately after this,
XSLTMediator executes the following piece of code:
boolean deleted = tempTargetFile.delete();
if (!deleted) {
tempTargetFile.deleteOnExit();
}
Since the file is still open at that moment, on Windows platforms, the
delete operation will fail. Probably that's why the call to
deleteOnExit has been added to the code. It follows that the file will
not be deleted until Synapse is shut down or restarted.
Note that on Unix systems the situation is different: the delete
operation will remove the directory entry, but not the inode (since
the file is still open). The inode will then be deleted by the OS when
the file is closed.
BTW: Who actually closes the FileInputStream?
3) While writing the result of the transformation to a temporary file
indeed eliminates the need to keep the entire output in memory, the
situation is a bit different for the input document. Indeed, during
the transformation the Axiom tree will be built and kept in memory
anyway. On the other hand, the XSLT processor also requires access to
the complete tree of the input document (except for XSLT processors
that supports streaming, which to my knowledge is not the case for
Xalan). Xalan uses its own object model called DTM (Document Table
Model) to store the input document in memory.
Since the input document must be kept in memory anyway, the only
question is how to efficiently feed the original Axiom tree into the
XSLT processor. Currently XSLTMediator uses two different strategies:
* If useDOMSourceAndResults is set to false, the Axiom tree will be
serialized to a byte stream (in memory or to a temporary file) and
then fed into the XSLT processor using a StreamSource object. Xalan
will then parse the byte stream and create a DTM representation.
* If useDOMSourceAndResults is set to true, the code will call
ElementHelper.importOMElement to get a DOM compliant version of the
input tree. From the code of this method it can be seen that an
entirely new copy of the input tree will be created. The resulting DOM
tree is then passed to the XSLT processor. Xalan will create a DTM
representation of this tree. This representation is not a complete
copy of the DOM tree, but some sort of wrapper/adapter that is backed
by the original DOM tree (at least that's my interpretation of the
document here: http://xml.apache.org/xalan-j/dtm.html).
Both strategies are far from optimal. There are at least two
strategies that should give better results (with at least one of them
being actually simpler):
* Trick Axis2 into producing a DOM compatible tree from the outset, by
using a StAXSOAPModelBuilder with a DOMSOAPFactory (my understanding
is that this produces objects that implement both the Axiom and DOM
interfaces). This however might be tricky and require some tweaking.
The advantage is that there is no need to create a copy anymore.
* Make sure that a DTM representation is created directly from the
Axiom tree without intermediate copy (byte stream or DOM tree). With
Java 6/JAXP 1.4 this will be very easy because it has support for
StAXSource, which integrates nicely with Axiom. In the meantime, the
solution is to pull StAX events from Axiom and push them as SAX events
to the XSLT processor. The Spring WS project has a utility class
StaxSource (extending SAXSource) that does this in a completely
transparent way (new StaxSource(omElement.getXMLStreamReader())). By
using getXMLStreamReaderWithoutCaching instead of getXMLStreamReader,
this could probably be further optimized to instruct Axiom not to
create the tree for the part of the input message that is being
transformed (unless it has already been constructed at that moment).
Regards,
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]