Hi Jens,
Thanks for the quick reaction!
I totally agree with you (and with THRIFT-414 for that matter) that the
wire format should always be UTF-8.
But, that's exactly what my Perl client is doing, I'm passing UTF-8
characters but for some reason the writeString method in the
BinaryProtocol package performs a encode_utf8 on the string
which, according to the Encode manual page:
- quote-
.... The characters that comprise $string are encoded in Perl's internal
format and the result is returned as a sequence of octets.
- unquote -
And it does this after it has done a check on the string using
utf8::is_utf8() which, according to the utf8 manual page:
- quote -
Test whether STRING is in UTF-8.
- unquote -
So, why is an encode done when the string is already in proper UTF-8?
Just out of pure curiosity I temporarily commented out the encode call
from the writeString method and then everything works fine! But that is
not a proper solution of course.
Kind regards,
Tom
On 01/21/2015 12:09 AM, Jens Geyer wrote:
Hi Tom,
I'm not exactly sure if I understand the issue correctly, but at least
I can say that the wire format of string shall be UTF-8. Anything else
is suspicios. See also
https://issues.apache.org/jira/browse/THRIFT-414 for a discussion of
the latter.
Does that help you any further?
Have fun,
JensG
-----Ursprüngliche Nachricht----- From: Tom Hesp
Sent: Tuesday, January 20, 2015 10:19 AM
To: [email protected]
Subject: Diacritics get garbled when sent from Perl client.
Hi,
This question may have been asked before on this list but I have not
been able to find anything about it.
I am using Thrift version 0.9.1 and have a C++ Thrift server maintaining
user records in a database.
When I send user information containing diacritics (like á, ö, è, etc.)
to it from a C++ or PHP client everything is fine.
However, when I do the same from a Perl client, the diacritics become
garbled. The example characters above are received by the server as
something like this: áöè
I am using the BinaryProtocol so I checked the BinaryProtocol.pm and saw
the following construct in writeString:
if( utf8::is_utf8($value) ){
$value = Encode::encode_utf8($value);
}
Which means that the string is encoded to Perl's internal format.
I also checked the C++ libraries at the receiving (server) end but I do
not see the string being decoded again!
I even tried this with a little Perl server but the results are the
same, the data gets encoded but is never decoded.
Am I missing something? Do I need to define something in the IDL so the
server knows it may have to decode the string?
Thanks for your time.
Kind regards,
Tom Hesp
--
--
*Tom Hesp *
SYSTEEMONTWIKKELAAR
SaaSplaza
*Office:* +31 (0)20 547 8409 | *Mobile:* +31 (0)6 538 95236
Stroombaan 6-8, 1181 VX Amstelveen, The Netherlands
_www.saasplaza.com <http://www.saasplaza.com/>_