Re: Problem with web service client encoding

Daniel Kulp Thu, 25 Apr 2013 11:31:52 -0700

On Apr 25, 2013, at 2:24 PM, Jose María Zaragoza <[email protected]> wrote:


> Thanks Daniel.
> 
> But there are some things that I dont understand
> 
> This is a log for a sending from a client by using CXF 2.7.3
> 
> Address: http://x.x.x.x:8080/services/WSHttpSoap11Endpoint/
> Encoding: UTF-8
> Http-Method: POST
> Content-Type: text/xml
> Headers: {Accept=[*/*], SOAPAction=["urn:process"]}
> Payload: <soap:Envelope xmlns:soap="
> http://schemas.xmlsoap.org/soap/envelope/";><soap:Body><ns3:process xmlns="
> http://bean.util.distribuidores.movistar/xsd"; xmlns:ns2="http://bean/xsd";
> xmlns:ns3="http://ws
> "><ns3:in><ns2:data>YYYY</ns2:data></ns3:in></ns3:process></soap:Body></soap:Envelope>
> 
> 
> As you can see,  Content-Type hasn't got a charset and Encoding is UTF-8.
> It should be ISO- 8859-1 ,  shouldn't it  ?

Well, no.  The CXF HTTP Conduit combines the internal "Content-Type" and the 
"Encoding" attributes from the message into the format that is needed for HTTP. 
 (JMS would do something different, etc…)   The "Encoding" there is what 
matters.   This is why I suggest grabbing wireshark and seeing what is on the 
raw wire transport.  


> So, I'm not sure that Encoding is the HTTP content encoding , right ?

On the client/conduit side, the HTTP transport would use that to setup the 
appropriate headers.   

> Futhermore, XML payload (SOAP message ) hasn't got a  <?xml ... ?> header
> with encoding
> I don't know if it makes any sense to have a XML encoding , because XML is
> build by CXF runtime and it chooses the encoding that it prefers

Yea.  It's pretty much redundant and not needed with soap as we know it's XML 
and we also know the charset from the HTTP headers.  Thus, we don't bother 
outputting it as it's just redundant information that wastes bandwidth. 
(admittedly not much, but some).

Dan


> 
> 
> Regards
> 
> 
> 2013/4/25 Daniel Kulp <[email protected]>
> 
>> 
>> On Apr 24, 2013, at 6:32 PM, Jose María Zaragoza <[email protected]>
>> wrote:
>> 
>>> Hello:
>>> 
>>> I' looking this example and I'd like to understand some things:
>>> 
>>> 1) Does 'Encoding: ISO-8859-1' refer to the HTTP header for defining
>>> content charset ?
>> 
>> Yea.  If you do a wireshark or similar to get the raw TCP bytes, this
>> should be the charset in the Content-Type header.   If there is not a
>> charset on the Content-Type header, the default (per HTTP spec) is
>> ISO-8859-1 which may be where this value is coming from.
>> 
>>> How does Apache CXF choose what is the HTTP header charset to return to a
>>> client ?
>> 
>> I think it will always use UTF-8 unless the user goes out of the way (and
>> it's not easy) to change it.   I'd need to dig through some code to verify
>> though.   At one point a long time ago, we did try to use the same charset
>> that the client sent the request in, but that became to complicated and
>> since pretty much everything now a days supports UTF-8, we just decided to
>> stick with UTF-8.
>> 
>> 
>>> 2) If HTTP response charset  is ISO-8859-1 but XML encoding is another (
>>> like this example ), What is the priority to decode the message ?
>> 
>> It would have to be the HTTP header.    We SHOULD be able to call the
>> HttpServletRequest.getReader() method to get a reader that is setup with
>> the appropriate charset for the input stream.  (we don't do this, but per
>> spec we should be able to)     The contents of the stream (which is where
>> the xml decl would be found) would be irrelevant for this.
>> 
>> Dan
>> 
>> 
>>> I guess that encoding document is first one , but I'm not sure
>>> 
>>> 
>>> Thanks
>>> 
>>> 
>>> 
>>> 2013/3/13 Daniel Kulp <[email protected]>
>>> 
>>>> 
>>>> On Mar 13, 2013, at 7:39 AM, Angel L. Garcia <[email protected]> wrote:
>>>>> I´ve a problem with client encoding, when I read some element with
>>>> special characters in response I get bad characters like ��
>>>>> 
>>>>> The log in is:
>>>>> 
>>>>> INFO: Inbound Message
>>>>> ----------------------------
>>>>> ID: 1
>>>>> Response-Code: 200
>>>>> Encoding: ISO-8859-1
>>>>> Content-Type: text/xml
>>>>> Headers: {connection=[Keep-Alive], Content-Language=[es-ES],
>>>> content-type=[text/xml], Date=[Wed, 13 Mar 2013 08:05:05 GMT],
>>>> transfer-encoding=[chunked], X-Backside-Transport=[OK OK]
>>>>> Messages:
>>>>> Message (saved to tmp file):
>>>>> Filename:
>>>> /tmp/tomcat6-tomcat6-tmp/cxf-tmp-966013/cos8205745368794988769tmp
>>>>> (message truncated to -1 bytes)
>>>>> 
>>>>> Payload: <?xml version="1.0" encoding="UTF-8"?>
>>>>> <soapenv:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance
>> "
>>>> ......
>>>>> 
>>>>> I think the problem is that there are two different encodings
>> "Encoding:
>>>> ISO-8859-1" and <?xml version="1.0" encoding="UTF-8"?>.
>>>>> Can I change the <?xml version="1.0" encoding="UTF-8"?> to <?xml
>>>> version="1.0" encoding="ISO-8859-1"?>?
>>>>> 
>>>>> Thanks and best regards.
>>>> 
>>>> Yea… that seems very wrong to me.  Seems like a bit of an invalid
>> message
>>>> as I'd expect the Content-Type to set a charset of utf-8.   I would
>> attempt
>>>> two things:
>>>> 
>>>> 1) Stick an interceptor on the incoming chain that would set:
>>>> message.put(Message.ENCODING, "UTF-8")   so that CXF would treat it as
>>>> UTF-8.
>>>> 
>>>> 2) You can try chaining the <?xml> header via an input stream filter or
>>>> similar.
>>>> 
>>>> 3) Remove the InputSteam from the message contents, wrapper it with an
>>>> InputStreamReader using whichever encoding works, and set that into the
>>>> message content as a Reader.class.   CXF will then delegate to that to
>>>> handle the charset stuff.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Daniel Kulp
>>>> [email protected] - http://dankulp.com/blog
>>>> Talend Community Coder - http://coders.talend.com
>>>> 
>>>> 
>> 
>> --
>> Daniel Kulp
>> [email protected] - http://dankulp.com/blog
>> Talend Community Coder - http://coders.talend.com
>> 
>> 

-- 
Daniel Kulp
[email protected] - http://dankulp.com/blog
Talend Community Coder - http://coders.talend.com

Re: Problem with web service client encoding

Reply via email to