[ https://issues.apache.org/jira/browse/THRIFT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688414#action_12688414 ]
Bryan Duxbury commented on THRIFT-395: -------------------------------------- Not being a pythonista myself, I can't speak to the implementation particulars, but in terms of the correct behavior, it sounds like Jonathan is on the right track. If the Thrift IDL says a field is of type "string", then you *must* UTF-8 encode/decode it for it to be wire compatible with other language libraries. If the IDL says "binary", then you should write it through with no encoding. Doing anything else is a break with the Thrift specification for the binary protocol. If, prior to this patch, you were UTF-8 encoding and encoding strings yourself and storing them in string fields, then yes, you will have to change your code if you want the field to remain a string. You can of course change your IDL to a "binary" field and leave your code the way it is, if that would somehow make your life easier. However, as any client code doing this is actually working around a library bug, it seems to me like it's something worth fixing, and in any case, should be a simplification. > Python library + compiler does not support unicode strings > ---------------------------------------------------------- > > Key: THRIFT-395 > URL: https://issues.apache.org/jira/browse/THRIFT-395 > Project: Thrift > Issue Type: Bug > Components: Compiler (Python), Library (Python) > Reporter: Jonathan Ellis > Assignee: Jonathan Ellis > Priority: Blocker > Attachments: python-utf8.patch > > > Effectively, all strings in the python bindings are treated as binary strings > -- no encoding/decoding to UTF-8 is done. So if a unicode object is passed > to a (regular, non-binary) string, an exception is raised. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.