[ 
https://issues.apache.org/jira/browse/THRIFT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694238#action_12694238
 ] 

David Reiss commented on THRIFT-395:
------------------------------------

bq. But "there is string, and binary, and sometimes the former is utf8-encoded, 
but not always" is not.
The consistency is that in every Thrift language, we use the native "string" 
type to represent the Thrift "string" type.  We do not try to force Unicode 
semantics on languages where they are non-idiomatic.

bq. (For what it's worth, protocol buffers defines `string` and `bytes` types, 
corresponding to the behavior of `string` and `binary` in what we are calling 
the "java and C# way" here.)
For what it's worth, protocol buffers use a blob type for strings in C++.

bq. your patch encodes on write but does not decode on read
Yeah, that was the point.  It gives application writers the option of putting 
unicode objects in their Thrift structures, but doesn't break compatibility 
with programs that use str objects and/or use alternate encodings for their 
strings.

bq. So even for python to python communication it is broken.
Works fine for me.

bq. (Surely we at least agree that the server read should return the same kind 
of object that the client wrote, and vice versa.)
We do: str

> Python library + compiler does not support unicode strings
> ----------------------------------------------------------
>
>                 Key: THRIFT-395
>                 URL: https://issues.apache.org/jira/browse/THRIFT-395
>             Project: Thrift
>          Issue Type: Bug
>          Components: Compiler (Python), Library (Python)
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Blocker
>             Fix For: 0.1
>
>         Attachments: 
> 0001-python-Minor-cleanup-of-protocols-don-t-use-str.patch, 
> 0002-THRIFT-395.-python-Phase-One-of-support-for-unicode.patch, 
> 0003-THRIFT-395.-python-Phase-Two-of-support-for-unicode.patch, 
> 0004-python-Remove-ridiculous-semicolons-from-gen-code.patch, 
> python-utf8-v2.patch, python-utf8.patch
>
>
> Effectively, all strings in the python bindings are treated as binary strings 
> -- no encoding/decoding to UTF-8 is done.  So if a unicode object is passed 
> to a (regular, non-binary) string, an exception is raised.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to