[ https://issues.apache.org/jira/browse/THRIFT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694665#action_12694665 ]
Chad Walters commented on THRIFT-395: ------------------------------------- bq. In other words, you are sending binary data that happens to be an encoded string and calling that a string, which it is not. It is binary data. That's working around one bug with another in my book. Partly true. 'string' was a bit overloaded and was used for arbitrary binary data as well. Facebook originally was mostly concerned with C++ and PHP. The string encoding issue didn't really crop up until Java started getting some real usage (my understanding is that it had been implemented by FB but not heavily exercised), which came after the initial open-sourcing of the code. Whether you see this as a bug or not depends if you think calling something a 'string' means that it is a C++ std:string (which can certainly be arbitrary binary data) or a Java String (which has an encoding attached to it). My personal take on it is that the 'string' type is unfortunately a bit schizophrenic around this -- in C++ it is std::string and in Java it is String. So if you want to talk to Java from C++ using 'string', you better submit to the strictures of Java String -- but if you only care about C++ and languages with other encoding-agnostic string primitives, then you don't. bq. In 2009 a language that doesn't support unicode is barely usable, and will almost certainly support unicode soon. Perhaps. I don't think C++ is going to change the semantics of std::string any time in the near future, however. I guess opinions will vary about whether C++ qualifies as "barely usable", "highly usable", or "eye-bleedingly unusable". > Python library + compiler does not support unicode strings > ---------------------------------------------------------- > > Key: THRIFT-395 > URL: https://issues.apache.org/jira/browse/THRIFT-395 > Project: Thrift > Issue Type: Bug > Components: Compiler (Python), Library (Python) > Reporter: Jonathan Ellis > Assignee: Jonathan Ellis > Priority: Blocker > Fix For: 0.1 > > Attachments: > 0001-python-Minor-cleanup-of-protocols-don-t-use-str.patch, > 0002-THRIFT-395.-python-Phase-One-of-support-for-unicode.patch, > 0003-THRIFT-395.-python-Phase-Two-of-support-for-unicode.patch, > 0004-python-Remove-ridiculous-semicolons-from-gen-code.patch, > python-utf8-v2.patch, python-utf8.patch > > > Effectively, all strings in the python bindings are treated as binary strings > -- no encoding/decoding to UTF-8 is done. So if a unicode object is passed > to a (regular, non-binary) string, an exception is raised. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.