I am sorry to bring this back from the dead. However I was just trying out
the unmarshal().xstream("ISO-8859-1") method introduced because of this
thread.  Unfortunately it still does not solve the problem (as of Camel
2.5.0)

>From non-camel routes, we have been publishing JMS messages and serializing
the message to XML as follows:

XStream xstream = new XStream(new DomDriver("ISO-8859-1"));
String messageXml = xstream.toXml(someObject);

Then using a producerTemplate to publish it to our messaging system.

When we used a route (like):

from(someIncomingEndpoint)
                .unmarshal().xstream("ISO-8859-1")
                .process(myUpdateProcessor);

Our processor received a deserialized message - but the content was not
correct.  It took strings that were serialized as ISO-8859-1 and then it
deserialized it as UTF-8.

I modified our route to introduce a new Processor (instead of the in-line
unmashal) that did the following:
String messageBody = exchange.getIn().getBody(String.class);
XStream xstream = new XStream(new DomDriver("ISO-8859-1"));
Object myObject = xstream.fromXml(messageBody );
exchange.getIn().setBody(myObject);

This works fine, the text our process receives is correct ISO-8859-1 and
nothing is garbled.

I set a breakpoint and stepped through the camel code with the in-line
unmarshal.  It does pass down the encoding specified (ISO-8859-1).  However
it constructs the XStream object using the default XppDriver (which you
can't specify an encoding on).  

According to the XStream documentation - the XppDriver (and others not
including DomDriver) rely on the underlying InputStream/OutputStream passed
to the XStream object to determine the encoding.

I found in this method of AbstractXStreamWrapper.java:

    public Object unmarshal(Exchange exchange, InputStream stream) throws
Exception {
        HierarchicalStreamReader reader =
createHierarchicalStreamReader(exchange, stream);
        try {
            return
getXStream(exchange.getContext().getClassResolver()).unmarshal(reader);
        } finally {
            reader.close();
        }
    }

The "HierarchicalStreamReader " that is created is of type:
com.thoughtworks.xstream.io.xml.StaxReader

When I stepped in to the "unmarshal" method the XStream class - I saw that
the reader passed in (the same StaxReader) has a property called "in" that
was of type: com.ctc.wstx.sr.ValidatingStreamReader

This, in turn, had 2 properties:

mDocInputEncoding = {java.lang.String@4784}"ISO-8859-1"
mDocXmlEncoding = {java.lang.String@4785}"UTF-8"

While I can't say that this is why the text is coming out as UTF-8 - but it
does seem suspicious that although the InputEncoding is set to ISO-8859-1,
the XmlEncoding is still "UTF-8".


In any event - for our own purposes we have created 2 Processor classes to
serialize/deserialize our XML.  We can't rely on the unmarshal/marshal
methods when it comes to encoding and our XML. 

Just wanted to pass along the news that the fix doesn't seem to have solved
the problem.

-- 
View this message in context: 
http://camel.465427.n5.nabble.com/XStream-and-forcing-ISO-8859-1-Encoding-tp478220p3355313.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Reply via email to