[ https://issues.apache.org/jira/browse/THRIFT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694238#action_12694238 ]
David Reiss commented on THRIFT-395: ------------------------------------ bq. But "there is string, and binary, and sometimes the former is utf8-encoded, but not always" is not. The consistency is that in every Thrift language, we use the native "string" type to represent the Thrift "string" type. We do not try to force Unicode semantics on languages where they are non-idiomatic. bq. (For what it's worth, protocol buffers defines `string` and `bytes` types, corresponding to the behavior of `string` and `binary` in what we are calling the "java and C# way" here.) For what it's worth, protocol buffers use a blob type for strings in C++. bq. your patch encodes on write but does not decode on read Yeah, that was the point. It gives application writers the option of putting unicode objects in their Thrift structures, but doesn't break compatibility with programs that use str objects and/or use alternate encodings for their strings. bq. So even for python to python communication it is broken. Works fine for me. bq. (Surely we at least agree that the server read should return the same kind of object that the client wrote, and vice versa.) We do: str > Python library + compiler does not support unicode strings > ---------------------------------------------------------- > > Key: THRIFT-395 > URL: https://issues.apache.org/jira/browse/THRIFT-395 > Project: Thrift > Issue Type: Bug > Components: Compiler (Python), Library (Python) > Reporter: Jonathan Ellis > Assignee: Jonathan Ellis > Priority: Blocker > Fix For: 0.1 > > Attachments: > 0001-python-Minor-cleanup-of-protocols-don-t-use-str.patch, > 0002-THRIFT-395.-python-Phase-One-of-support-for-unicode.patch, > 0003-THRIFT-395.-python-Phase-Two-of-support-for-unicode.patch, > 0004-python-Remove-ridiculous-semicolons-from-gen-code.patch, > python-utf8-v2.patch, python-utf8.patch > > > Effectively, all strings in the python bindings are treated as binary strings > -- no encoding/decoding to UTF-8 is done. So if a unicode object is passed > to a (regular, non-binary) string, an exception is raised. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.