[ 
https://issues.apache.org/jira/browse/THRIFT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694647#action_12694647
 ] 

Chad Walters commented on THRIFT-395:
-------------------------------------

Jonathan, when I said there was not complete consistency, I was referring to 
consistently choosing between maximum interoperability across languages vs more 
flexibility within given languages to follow their idioms. The choice in 
various cases has been somewhat driven by pragmatics and somewhat by historical 
factors. 

WRT the treatment of strings, there is consistency of a kind here: in this 
instance, the choice was made for more flexibility within given languages 
(although I agree with your characterization that concerns about 
backwards-compatibility breakage also played a role in the choice). The current 
situation is that 'string' is free to contain arbitrary data in languages where 
that is supported; but clearly not so in language that enforce encoding on 
strings (as in Java and C#). If you want to interoperate with those languages, 
then make sure your application only passes UTF8 encoded data in 'string'.

I think the issue is mostly that you don't like the answer you are getting and 
partly that the differences between Python2 and Python3 with regards to 
enforcing encoding (if I am understanding correctly, Python3 is now in the same 
camp as Java and C# -- is that correct? If so, maybe we want to treat Python3 
as a different target language from Python2, which might sidestep some of the 
issues here since I detect a little bit of pro-Python2 on David's part vs 
pro-Python3 on your part).

bq. Isn't that mostly a non-issue, though? If you are using the current code 
and sending binary data as a "string" then you are probably using Python on 
both client and server or things would already be broken.

I am not sure it's a non-issue. You are mistaken about using only Python on 
both side (or the same language on both sides if that is what you meant). You 
can currently send binary data via 'string' comfortably across C++, Ruby, PHP, 
etc, as well. I may be wrong, but I think readBinary() and readString() can 
return different types in different languages (although I do see that in C++ 
they both return std::string, which at least knocks out part of my concern -- 
are there other languages where the types might matter).

>Right now most thrift implementations cannot talk to my Java server and that 
>is broken.

Why is this? We interoperate via Thrift across C++, Ruby, Java, Python2 and 
Erlang here and everything works just fine. We just make limited use of the 
'string' type -- and make sure that applications only send UTF-8 data via 
'string'.

Consider that not all Thrift shops use all languages. From their perspective, 
they don't want to "dumb down" their type system and flexibility because of 
some language that they don't care about interoperating with.

That said (and as I said before) I am totally sympathetic to your concerns -- 
my preference would be that we more consistently choose in favor of maximal 
interoperability. I am just pointing out that the state of things is not as bad 
as you seem to believe they are and that these choices have not been completely 
arbitrary or without merit.

> Python library + compiler does not support unicode strings
> ----------------------------------------------------------
>
>                 Key: THRIFT-395
>                 URL: https://issues.apache.org/jira/browse/THRIFT-395
>             Project: Thrift
>          Issue Type: Bug
>          Components: Compiler (Python), Library (Python)
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Blocker
>             Fix For: 0.1
>
>         Attachments: 
> 0001-python-Minor-cleanup-of-protocols-don-t-use-str.patch, 
> 0002-THRIFT-395.-python-Phase-One-of-support-for-unicode.patch, 
> 0003-THRIFT-395.-python-Phase-Two-of-support-for-unicode.patch, 
> 0004-python-Remove-ridiculous-semicolons-from-gen-code.patch, 
> python-utf8-v2.patch, python-utf8.patch
>
>
> Effectively, all strings in the python bindings are treated as binary strings 
> -- no encoding/decoding to UTF-8 is done.  So if a unicode object is passed 
> to a (regular, non-binary) string, an exception is raised.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to