[ https://issues.apache.org/jira/browse/THRIFT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695037#action_12695037 ]
Bryan Duxbury commented on THRIFT-395: -------------------------------------- I think the point is that we want to strengthen the meaning of "string" wherever possible. Clearly it used to be used like arbitrary bytes, but since we have binary now, it seems to make sense that the key use case is for actual text. In some ways, I see specifying the encoding of strings as a necessary part of the protocol. After all, the protocol specifies the encoding of ints, doubles, maps, etc, right? Jonathan has consistently argued for us to have a standard. Right now, we have a de facto standard of "UTF8 if it's convenient, whatever else otherwise". This can obviously lead to problems in some situations. Yes, you can make the application be concerned with the encoding, but that seems like a workaround, and it will quickly become inconvenient if you have more than two languages involved. In general, I'm sort of against allowing "alternate encodings" (a la THRIFT-414), because it seems like overkill for the problem. Either you are dealing with strings that could contain special characters, in which case you're probably looking for Unicode support, or you basically don't care about encoding, in which case the base subset of ASCII is probably more than enough for you. I think it's tricky to add annotations for string encodings because the wire won't contain that information, and could lead to you being able to read but unable to decode a string sent to you. > Python library + compiler does not support unicode strings > ---------------------------------------------------------- > > Key: THRIFT-395 > URL: https://issues.apache.org/jira/browse/THRIFT-395 > Project: Thrift > Issue Type: Improvement > Components: Compiler (Python), Library (Python) > Reporter: Jonathan Ellis > Assignee: Jonathan Ellis > Fix For: 0.1 > > Attachments: > 0001-python-Minor-cleanup-of-protocols-don-t-use-str.patch, > 0002-THRIFT-395.-python-Phase-One-of-support-for-unicode.patch, > 0003-THRIFT-395.-python-Phase-Two-of-support-for-unicode.patch, > 0004-python-Remove-ridiculous-semicolons-from-gen-code.patch, > python-utf8-v2.patch, python-utf8.patch > > > Effectively, all strings in the python bindings are treated as binary strings > -- no encoding/decoding to UTF-8 is done. So if a unicode object is passed > to a (regular, non-binary) string, an exception is raised. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.