Hi Jens,

Thanks for the quick reaction!

I totally agree with you (and with THRIFT-414 for that matter) that the wire format should always be UTF-8. But, that's exactly what my Perl client is doing, I'm passing UTF-8 characters but for some reason the writeString method in the BinaryProtocol package performs a encode_utf8 on the string
which, according to the Encode manual page:
- quote-
.... The characters that comprise $string are encoded in Perl's internal format and the result is returned as a sequence of octets.
- unquote -

And it does this after it has done a check on the string using utf8::is_utf8() which, according to the utf8 manual page:
- quote -
Test whether STRING is in UTF-8.
- unquote -

So, why is an encode done when the string is already in proper UTF-8?

Just out of pure curiosity I temporarily commented out the encode call from the writeString method and then everything works fine! But that is not a proper solution of course.

Kind regards,
Tom

On 01/21/2015 12:09 AM, Jens Geyer wrote:

Hi Tom,

I'm not exactly sure if I understand the issue correctly, but at least I can say that the wire format of string shall be UTF-8. Anything else is suspicios. See also https://issues.apache.org/jira/browse/THRIFT-414 for a discussion of the latter.

Does that help you any further?

Have fun,
JensG




-----Ursprüngliche Nachricht----- From: Tom Hesp
Sent: Tuesday, January 20, 2015 10:19 AM
To: [email protected]
Subject: Diacritics get garbled when sent from Perl client.

Hi,

This question may have been asked before on this list but I have not
been able to find anything about it.

I am using Thrift version 0.9.1 and have a C++ Thrift server maintaining
user records in a database.
When I send user information containing diacritics (like á, ö, è, etc.)
to it from a C++ or PHP client everything is fine.
However, when I do the same from a Perl client, the diacritics become
garbled. The example characters above are received by the server as
something like this: áöè

I am using the BinaryProtocol so I checked the BinaryProtocol.pm and saw
the following construct in writeString:
    if( utf8::is_utf8($value) ){
        $value = Encode::encode_utf8($value);
    }
Which means that the string is encoded to Perl's internal format.

I also checked the C++ libraries at the receiving (server) end but I do
not see the string being decoded again!
I even tried this with a little Perl server but the results are the
same, the data gets encoded but is never decoded.

Am I missing something? Do I need to define something in the IDL so the
server knows it may have to decode the string?

Thanks for your time.

Kind regards,
Tom Hesp
--

--

*Tom Hesp *
SYSTEEMONTWIKKELAAR

SaaSplaza


*Office:* +31 (0)20 547 8409  | *Mobile:* +31 (0)6 538 95236
Stroombaan 6-8, 1181 VX  Amstelveen, The Netherlands

_www.saasplaza.com <http://www.saasplaza.com/>_

Reply via email to