On 24 Dec 2007, at 17:04, Paul Fremantle wrote:
One more improvement.... I think we should make it possible to change
the default size that triggers a file using a config file (e.g.
synapse.properties).
I agree. Please raise a JIRA :)
Anything else?
I had a look at the code that handles the case where the output of the
transformation is text rather than XML. I think there are multiple
issues:
1) There are multiple places where character streams are converted to
byte streams and vice versa:
* Since the XSLT processor is configured with a StreamResult writing
to an OutputStream (ByteArrayOutputStream or FileOutputStream), it
will convert the output to a byte stream.
* The output is then converted back to a character stream using
ByteArrayOutputStream#toString or using TextFileDataSource.
* In VFSTransportSender it is converted back to a byte stream using
String#getBytes or OMNode#serializeAndConsume.
The problem is that nowhere the code cares about the character
encoding that is used in these conversions. I opened SYNAPSE-215 to
describe the issue with ByteArrayOutputStream#toString. Probably in
many cases the different issues tend to compensate each other so that
the end result is correct. For example, ByteArrayOutputStream#toString
and String#getBytes both use the platform's default encoding, so that
the original byte stream is reconstructed. However this will fail if
the byte stream contains sequences that are not valid in the default
encoding (this may happen e.g. in UTF-8). Anyway, Synapse should be
fixed to handle character encodings properly from end to end.
2) There are specific issues with TextFileDataSource:
* When a ByteArrayOutputStream is used, the result is parsed as plain
text (since an OMText object is created directly from the result of
ByteArrayOutputStream#toString). On the other hand, when
TextFileDataSource is used, the result is parsed as XML (more
precisely as an external parsed general entity). For example, the
ampersand (&) is considered as the start character for an XML entity.
I opened SYNAPSE-216 for this issue. Note however that when the data
is consumed by VFSTransportSender, this problem is circumvented by the
fact that the serialize method bypasses the XML parsing...
* TextFileDataSource implements OMDataSource but doesn't respect the
contract (the Javadoc of OMDataSource is not very explicit but this
can be seen from various examples in the Axis 2 source code):
- serialize(OutputStream, OMOutputFormat) doesn't output the <text>
wrapper element (actually the code is commented out) and doesn't take
into account the character encoding specified by the OMOutputFormat.
- serialize(Writer, OMOutputFormat) only outputs an empty <text>
element.
While this is exactly what is expected by VFSTransportSender, this
might lead to unexpected results in other situations.
The purpose of TextFileDataSource is actually to implement a text node
(+ <text> wrapper element) that is backed by a temporary file rather
than a String/char[] object, thereby avoiding to load the entire file
into memory. Maybe we should consider another solution that avoids the
problems described above. The idea would be to use a custom
implementation of OMText (that again is backed by a temporary file). I
think that if the custom implementation extends OMNodeImpl and
implements OMText, an instance can be added to the Axiom tree without
problem, given that the Axiom code never casts OMText to OMTextImpl.
An alternative (but less clean) solution would be to extend OMTextImpl.
3) Before solving some of the issues described above, another question
needs to be addressed. XSLTMediator actually uses the following
strategy to handle text output: it first tries to parse the output as
an XML document and when this fails it will consider the output as
text. There are however two different cases where this happens:
* The stylesheet specifies "text" as output method. In this case the
output is plain text and will be parsed correctly when a
ByteArrayOutputStream is used, but not when a TextFileDataSource is
used (see above).
* The stylesheet specifies "xml" as output method (or doesn't specify
an output method at all), but produces output that is not well formed
(typically text only). In this case XSLTMediator will also consider
the output as text. However, since the output method is XML, some
characters are replaced by their corresponding XML entities (such as
&), which will be parsed correctly by TextFileDataSource, but not
when ByteArrayOutputStream is used.
While it is probably possible to handle both cases correctly, this
would introduce unnecessary complexity to the code. I think it is not
necessary to support the second case. I would expect that when a
stylesheet specifies XML as output method but fails to produce a well
formed XML document, the mediation fails with an error. However, some
of the example stylesheets (see java/repository/conf/sample/resources/
transform/transform_load.xml) that come with the Synapse source code
don't specify "text" as output method but produce text only.
Regards,
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]