Re: Possible improvements in XSLTMediator

Andreas Veithen Thu, 27 Dec 2007 14:40:45 -0800


On 24 Dec 2007, at 17:04, Paul Fremantle wrote:

One more improvement.... I think we should make it possible to change
the default size that triggers a file using a config file (e.g.
synapse.properties).


I agree. Please raise a JIRA :)

Anything else?

I had a look at the code that handles the case where the output of thetransformation is text rather than XML. I think there are multipleissues:

1) There are multiple places where character streams are converted tobyte streams and vice versa:

* Since the XSLT processor is configured with a StreamResult writingto an OutputStream (ByteArrayOutputStream or FileOutputStream), itwill convert the output to a byte stream.* The output is then converted back to a character stream usingByteArrayOutputStream#toString or using TextFileDataSource.* In VFSTransportSender it is converted back to a byte stream usingString#getBytes or OMNode#serializeAndConsume.

The problem is that nowhere the code cares about the characterencoding that is used in these conversions. I opened SYNAPSE-215 todescribe the issue with ByteArrayOutputStream#toString. Probably inmany cases the different issues tend to compensate each other so thatthe end result is correct. For example, ByteArrayOutputStream#toStringand String#getBytes both use the platform's default encoding, so thatthe original byte stream is reconstructed. However this will fail ifthe byte stream contains sequences that are not valid in the defaultencoding (this may happen e.g. in UTF-8). Anyway, Synapse should befixed to handle character encodings properly from end to end.


2) There are specific issues with TextFileDataSource:

* When a ByteArrayOutputStream is used, the result is parsed as plaintext (since an OMText object is created directly from the result ofByteArrayOutputStream#toString). On the other hand, whenTextFileDataSource is used, the result is parsed as XML (moreprecisely as an external parsed general entity). For example, theampersand (&) is considered as the start character for an XML entity.I opened SYNAPSE-216 for this issue. Note however that when the datais consumed by VFSTransportSender, this problem is circumvented by thefact that the serialize method bypasses the XML parsing...

* TextFileDataSource implements OMDataSource but doesn't respect thecontract (the Javadoc of OMDataSource is not very explicit but thiscan be seen from various examples in the Axis 2 source code):- serialize(OutputStream, OMOutputFormat) doesn't output the <text>wrapper element (actually the code is commented out) and doesn't takeinto account the character encoding specified by the OMOutputFormat.- serialize(Writer, OMOutputFormat) only outputs an empty <text>element.While this is exactly what is expected by VFSTransportSender, thismight lead to unexpected results in other situations.

The purpose of TextFileDataSource is actually to implement a text node(+ <text> wrapper element) that is backed by a temporary file ratherthan a String/char[] object, thereby avoiding to load the entire fileinto memory. Maybe we should consider another solution that avoids theproblems described above. The idea would be to use a customimplementation of OMText (that again is backed by a temporary file). Ithink that if the custom implementation extends OMNodeImpl andimplements OMText, an instance can be added to the Axiom tree withoutproblem, given that the Axiom code never casts OMText to OMTextImpl.An alternative (but less clean) solution would be to extend OMTextImpl.

3) Before solving some of the issues described above, another questionneeds to be addressed. XSLTMediator actually uses the followingstrategy to handle text output: it first tries to parse the output asan XML document and when this fails it will consider the output astext. There are however two different cases where this happens:

* The stylesheet specifies "text" as output method. In this case theoutput is plain text and will be parsed correctly when aByteArrayOutputStream is used, but not when a TextFileDataSource isused (see above).* The stylesheet specifies "xml" as output method (or doesn't specifyan output method at all), but produces output that is not well formed(typically text only). In this case XSLTMediator will also considerthe output as text. However, since the output method is XML, somecharacters are replaced by their corresponding XML entities (such as&), which will be parsed correctly by TextFileDataSource, but notwhen ByteArrayOutputStream is used.

While it is probably possible to handle both cases correctly, thiswould introduce unnecessary complexity to the code. I think it is notnecessary to support the second case. I would expect that when astylesheet specifies XML as output method but fails to produce a wellformed XML document, the mediation fails with an error. However, someof the example stylesheets (see java/repository/conf/sample/resources/transform/transform_load.xml) that come with the Synapse source codedon't specify "text" as output method but produce text only.



Regards,

Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Possible improvements in XSLTMediator

Reply via email to