[ https://issues.apache.org/jira/browse/THRIFT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694510#action_12694510 ]
Esteve Fernandez commented on THRIFT-395: ----------------------------------------- I'd rather use unicode string everywhere, but if we have to maintain backwards compatibility with legacy code, what do you think of adding a new annotation (e.g. string.encoding) for specifying the actual string encoding? Something like this: typedef string (string.encoding = "utf8") ustring and then use ustring instead of string. In any case, I'd deprecate str strings and, at some point in the future, support unicode strings only. In the case of Python, that would make it easier to support Python 3.0 (as it only supports unicode) I'm not sure if I'll be able to write a patch for this in the next couple of days, but will try if there's consensus on using annotations for this. > Python library + compiler does not support unicode strings > ---------------------------------------------------------- > > Key: THRIFT-395 > URL: https://issues.apache.org/jira/browse/THRIFT-395 > Project: Thrift > Issue Type: Bug > Components: Compiler (Python), Library (Python) > Reporter: Jonathan Ellis > Assignee: Jonathan Ellis > Priority: Blocker > Fix For: 0.1 > > Attachments: > 0001-python-Minor-cleanup-of-protocols-don-t-use-str.patch, > 0002-THRIFT-395.-python-Phase-One-of-support-for-unicode.patch, > 0003-THRIFT-395.-python-Phase-Two-of-support-for-unicode.patch, > 0004-python-Remove-ridiculous-semicolons-from-gen-code.patch, > python-utf8-v2.patch, python-utf8.patch > > > Effectively, all strings in the python bindings are treated as binary strings > -- no encoding/decoding to UTF-8 is done. So if a unicode object is passed > to a (regular, non-binary) string, an exception is raised. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.