On Apr 24, 2013, at 6:32 PM, Jose María Zaragoza <[email protected]> wrote:

> Hello:
> 
> I' looking this example and I'd like to understand some things:
> 
> 1) Does 'Encoding: ISO-8859-1' refer to the HTTP header for defining
> content charset ?

Yea.  If you do a wireshark or similar to get the raw TCP bytes, this should be 
the charset in the Content-Type header.   If there is not a charset on the 
Content-Type header, the default (per HTTP spec) is ISO-8859-1 which may be 
where this value is coming from.

> How does Apache CXF choose what is the HTTP header charset to return to a
> client ?

I think it will always use UTF-8 unless the user goes out of the way (and it's 
not easy) to change it.   I'd need to dig through some code to verify though.   
At one point a long time ago, we did try to use the same charset that the 
client sent the request in, but that became to complicated and since pretty 
much everything now a days supports UTF-8, we just decided to stick with UTF-8.


> 2) If HTTP response charset  is ISO-8859-1 but XML encoding is another (
> like this example ), What is the priority to decode the message ?

It would have to be the HTTP header.    We SHOULD be able to call the 
HttpServletRequest.getReader() method to get a reader that is setup with the 
appropriate charset for the input stream.  (we don't do this, but per spec we 
should be able to)     The contents of the stream (which is where the xml decl 
would be found) would be irrelevant for this.

Dan


> I guess that encoding document is first one , but I'm not sure
> 
> 
> Thanks
> 
> 
> 
> 2013/3/13 Daniel Kulp <[email protected]>
> 
>> 
>> On Mar 13, 2013, at 7:39 AM, Angel L. Garcia <[email protected]> wrote:
>>> I´ve a problem with client encoding, when I read some element with
>> special characters in response I get bad characters like ��
>>> 
>>> The log in is:
>>> 
>>> INFO: Inbound Message
>>> ----------------------------
>>> ID: 1
>>> Response-Code: 200
>>> Encoding: ISO-8859-1
>>> Content-Type: text/xml
>>> Headers: {connection=[Keep-Alive], Content-Language=[es-ES],
>> content-type=[text/xml], Date=[Wed, 13 Mar 2013 08:05:05 GMT],
>> transfer-encoding=[chunked], X-Backside-Transport=[OK OK]
>>> Messages:
>>> Message (saved to tmp file):
>>> Filename:
>> /tmp/tomcat6-tomcat6-tmp/cxf-tmp-966013/cos8205745368794988769tmp
>>> (message truncated to -1 bytes)
>>> 
>>> Payload: <?xml version="1.0" encoding="UTF-8"?>
>>> <soapenv:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>> ......
>>> 
>>> I think the problem is that there are two different encodings "Encoding:
>> ISO-8859-1" and <?xml version="1.0" encoding="UTF-8"?>.
>>> Can I change the <?xml version="1.0" encoding="UTF-8"?> to <?xml
>> version="1.0" encoding="ISO-8859-1"?>?
>>> 
>>> Thanks and best regards.
>> 
>> Yea… that seems very wrong to me.  Seems like a bit of an invalid message
>> as I'd expect the Content-Type to set a charset of utf-8.   I would attempt
>> two things:
>> 
>> 1) Stick an interceptor on the incoming chain that would set:
>> message.put(Message.ENCODING, "UTF-8")   so that CXF would treat it as
>> UTF-8.
>> 
>> 2) You can try chaining the <?xml> header via an input stream filter or
>> similar.
>> 
>> 3) Remove the InputSteam from the message contents, wrapper it with an
>> InputStreamReader using whichever encoding works, and set that into the
>> message content as a Reader.class.   CXF will then delegate to that to
>> handle the charset stuff.
>> 
>> 
>> 
>> 
>> --
>> Daniel Kulp
>> [email protected] - http://dankulp.com/blog
>> Talend Community Coder - http://coders.talend.com
>> 
>> 

-- 
Daniel Kulp
[email protected] - http://dankulp.com/blog
Talend Community Coder - http://coders.talend.com

Reply via email to