[ 
https://issues.apache.org/jira/browse/THRIFT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694247#action_12694247
 ] 

Jonathan Ellis commented on THRIFT-395:
---------------------------------------

> The consistency is that in every Thrift language, we use the native "string" 
> type to represent the Thrift "string" type.

Then you should be honest and just use binary everywhere, because native string 
types are not at all cross-platform.

> We do not try to force Unicode semantics on languages where they are 
> non-idiomatic.

I've explained what modern Python idiom is: strings may be ascii `str` or any 
`unicode`.  Binary data is also represented as `str` but that does not make it 
a "string."  

So I'm very skeptical of this appeal to idiom when the current behavior is NOT 
idimatic for Python any time since the unicode type was added.  (2.0, october 
2000.)

> For what it's worth, protocol buffers use a blob type for strings in C++.

See http://code.google.com/apis/protocolbuffers/docs/proto.html.  "A string 
must always contain UTF-8 encoded or 7-bit ASCII text."

> It gives application writers the option of putting unicode objects in their 
> Thrift structures

to be read out as str?  Doing half of encode/decode is worse than not doing it 
at all.

> We do: str

You just admitted that when you write unicode it reads back as str.

---

"if you have code that is using the Thrift string type when it should be 
binary, s/string/binary/ in your IDL is a virtually painless change to make."

Assuming for the sake of argument that strings should be utf8 (which includes 
ascii!), do you agree with the above statement?

> Python library + compiler does not support unicode strings
> ----------------------------------------------------------
>
>                 Key: THRIFT-395
>                 URL: https://issues.apache.org/jira/browse/THRIFT-395
>             Project: Thrift
>          Issue Type: Bug
>          Components: Compiler (Python), Library (Python)
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Blocker
>             Fix For: 0.1
>
>         Attachments: 
> 0001-python-Minor-cleanup-of-protocols-don-t-use-str.patch, 
> 0002-THRIFT-395.-python-Phase-One-of-support-for-unicode.patch, 
> 0003-THRIFT-395.-python-Phase-Two-of-support-for-unicode.patch, 
> 0004-python-Remove-ridiculous-semicolons-from-gen-code.patch, 
> python-utf8-v2.patch, python-utf8.patch
>
>
> Effectively, all strings in the python bindings are treated as binary strings 
> -- no encoding/decoding to UTF-8 is done.  So if a unicode object is passed 
> to a (regular, non-binary) string, an exception is raised.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to