[ https://issues.apache.org/jira/browse/THRIFT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694647#action_12694647 ]
Chad Walters commented on THRIFT-395: ------------------------------------- Jonathan, when I said there was not complete consistency, I was referring to consistently choosing between maximum interoperability across languages vs more flexibility within given languages to follow their idioms. The choice in various cases has been somewhat driven by pragmatics and somewhat by historical factors. WRT the treatment of strings, there is consistency of a kind here: in this instance, the choice was made for more flexibility within given languages (although I agree with your characterization that concerns about backwards-compatibility breakage also played a role in the choice). The current situation is that 'string' is free to contain arbitrary data in languages where that is supported; but clearly not so in language that enforce encoding on strings (as in Java and C#). If you want to interoperate with those languages, then make sure your application only passes UTF8 encoded data in 'string'. I think the issue is mostly that you don't like the answer you are getting and partly that the differences between Python2 and Python3 with regards to enforcing encoding (if I am understanding correctly, Python3 is now in the same camp as Java and C# -- is that correct? If so, maybe we want to treat Python3 as a different target language from Python2, which might sidestep some of the issues here since I detect a little bit of pro-Python2 on David's part vs pro-Python3 on your part). bq. Isn't that mostly a non-issue, though? If you are using the current code and sending binary data as a "string" then you are probably using Python on both client and server or things would already be broken. I am not sure it's a non-issue. You are mistaken about using only Python on both side (or the same language on both sides if that is what you meant). You can currently send binary data via 'string' comfortably across C++, Ruby, PHP, etc, as well. I may be wrong, but I think readBinary() and readString() can return different types in different languages (although I do see that in C++ they both return std::string, which at least knocks out part of my concern -- are there other languages where the types might matter). >Right now most thrift implementations cannot talk to my Java server and that >is broken. Why is this? We interoperate via Thrift across C++, Ruby, Java, Python2 and Erlang here and everything works just fine. We just make limited use of the 'string' type -- and make sure that applications only send UTF-8 data via 'string'. Consider that not all Thrift shops use all languages. From their perspective, they don't want to "dumb down" their type system and flexibility because of some language that they don't care about interoperating with. That said (and as I said before) I am totally sympathetic to your concerns -- my preference would be that we more consistently choose in favor of maximal interoperability. I am just pointing out that the state of things is not as bad as you seem to believe they are and that these choices have not been completely arbitrary or without merit. > Python library + compiler does not support unicode strings > ---------------------------------------------------------- > > Key: THRIFT-395 > URL: https://issues.apache.org/jira/browse/THRIFT-395 > Project: Thrift > Issue Type: Bug > Components: Compiler (Python), Library (Python) > Reporter: Jonathan Ellis > Assignee: Jonathan Ellis > Priority: Blocker > Fix For: 0.1 > > Attachments: > 0001-python-Minor-cleanup-of-protocols-don-t-use-str.patch, > 0002-THRIFT-395.-python-Phase-One-of-support-for-unicode.patch, > 0003-THRIFT-395.-python-Phase-Two-of-support-for-unicode.patch, > 0004-python-Remove-ridiculous-semicolons-from-gen-code.patch, > python-utf8-v2.patch, python-utf8.patch > > > Effectively, all strings in the python bindings are treated as binary strings > -- no encoding/decoding to UTF-8 is done. So if a unicode object is passed > to a (regular, non-binary) string, an exception is raised. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.