[ 
https://issues.apache.org/jira/browse/THRIFT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694665#action_12694665
 ] 

Chad Walters commented on THRIFT-395:
-------------------------------------

bq. In other words, you are sending binary data that happens to be an encoded 
string and calling that a string, which it is not. It is binary data. That's 
working around one bug with another in my book.

Partly true. 'string' was a bit overloaded and was used for arbitrary binary 
data as well. Facebook originally was mostly concerned with C++ and PHP. The 
string encoding issue didn't really crop up until Java started getting some 
real usage (my understanding is that it had been implemented by FB but not 
heavily exercised), which came after the initial open-sourcing of the code. 
Whether you see this as a bug or not depends if you think calling something a 
'string' means that it is a C++ std:string (which can certainly be arbitrary 
binary data) or a Java String (which has an encoding attached to it). My 
personal take on it is that the 'string' type is unfortunately a bit 
schizophrenic around this -- in C++ it is std::string and in Java it is String. 
So if you want to talk to Java from C++ using 'string', you better submit to 
the strictures of Java String -- but if you only care about C++ and languages 
with other encoding-agnostic string primitives, then you don't.

bq. In 2009 a language that doesn't support unicode is barely usable, and will 
almost certainly support unicode soon.

Perhaps. I don't think C++ is going to change the semantics of std::string any 
time in the near future, however. I guess opinions will vary about whether C++ 
qualifies as "barely usable", "highly usable", or "eye-bleedingly unusable".



> Python library + compiler does not support unicode strings
> ----------------------------------------------------------
>
>                 Key: THRIFT-395
>                 URL: https://issues.apache.org/jira/browse/THRIFT-395
>             Project: Thrift
>          Issue Type: Bug
>          Components: Compiler (Python), Library (Python)
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Blocker
>             Fix For: 0.1
>
>         Attachments: 
> 0001-python-Minor-cleanup-of-protocols-don-t-use-str.patch, 
> 0002-THRIFT-395.-python-Phase-One-of-support-for-unicode.patch, 
> 0003-THRIFT-395.-python-Phase-Two-of-support-for-unicode.patch, 
> 0004-python-Remove-ridiculous-semicolons-from-gen-code.patch, 
> python-utf8-v2.patch, python-utf8.patch
>
>
> Effectively, all strings in the python bindings are treated as binary strings 
> -- no encoding/decoding to UTF-8 is done.  So if a unicode object is passed 
> to a (regular, non-binary) string, an exception is raised.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to