[ https://issues.apache.org/jira/browse/THRIFT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694247#action_12694247 ]
Jonathan Ellis commented on THRIFT-395: --------------------------------------- > The consistency is that in every Thrift language, we use the native "string" > type to represent the Thrift "string" type. Then you should be honest and just use binary everywhere, because native string types are not at all cross-platform. > We do not try to force Unicode semantics on languages where they are > non-idiomatic. I've explained what modern Python idiom is: strings may be ascii `str` or any `unicode`. Binary data is also represented as `str` but that does not make it a "string." So I'm very skeptical of this appeal to idiom when the current behavior is NOT idimatic for Python any time since the unicode type was added. (2.0, october 2000.) > For what it's worth, protocol buffers use a blob type for strings in C++. See http://code.google.com/apis/protocolbuffers/docs/proto.html. "A string must always contain UTF-8 encoded or 7-bit ASCII text." > It gives application writers the option of putting unicode objects in their > Thrift structures to be read out as str? Doing half of encode/decode is worse than not doing it at all. > We do: str You just admitted that when you write unicode it reads back as str. --- "if you have code that is using the Thrift string type when it should be binary, s/string/binary/ in your IDL is a virtually painless change to make." Assuming for the sake of argument that strings should be utf8 (which includes ascii!), do you agree with the above statement? > Python library + compiler does not support unicode strings > ---------------------------------------------------------- > > Key: THRIFT-395 > URL: https://issues.apache.org/jira/browse/THRIFT-395 > Project: Thrift > Issue Type: Bug > Components: Compiler (Python), Library (Python) > Reporter: Jonathan Ellis > Assignee: Jonathan Ellis > Priority: Blocker > Fix For: 0.1 > > Attachments: > 0001-python-Minor-cleanup-of-protocols-don-t-use-str.patch, > 0002-THRIFT-395.-python-Phase-One-of-support-for-unicode.patch, > 0003-THRIFT-395.-python-Phase-Two-of-support-for-unicode.patch, > 0004-python-Remove-ridiculous-semicolons-from-gen-code.patch, > python-utf8-v2.patch, python-utf8.patch > > > Effectively, all strings in the python bindings are treated as binary strings > -- no encoding/decoding to UTF-8 is done. So if a unicode object is passed > to a (regular, non-binary) string, an exception is raised. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.