[jira] Commented: (THRIFT-395) Python library + compiler does not support unicode strings

Bryan Duxbury (JIRA) Thu, 02 Apr 2009 08:24:50 -0700

    [ 
https://issues.apache.org/jira/browse/THRIFT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695037#action_12695037
 ]


Bryan Duxbury commented on THRIFT-395:
--------------------------------------

I think the point is that we want to strengthen the meaning of "string" 
wherever possible. Clearly it used to be used like arbitrary bytes, but since 
we have binary now, it seems to make sense that the key use case is for actual 
text. In some ways, I see specifying the encoding of strings as a necessary 
part of the protocol. After all, the protocol specifies the encoding of ints, 
doubles, maps, etc, right? Jonathan has consistently argued for us to have a 
standard.

Right now, we have a de facto standard of "UTF8 if it's convenient, whatever 
else otherwise". This can obviously lead to problems in some situations. Yes, 
you can make the application be concerned with the encoding, but that seems 
like a workaround, and it will quickly become inconvenient if you have more 
than two languages involved. 

In general, I'm sort of against allowing "alternate encodings" (a la 
THRIFT-414), because it seems like overkill for the problem. Either you are 
dealing with strings that could contain special characters, in which case 
you're probably looking for Unicode support, or you basically don't care about 
encoding, in which case the base subset of ASCII is probably more than enough 
for you. I think it's tricky to add annotations for string encodings because 
the wire won't contain that information, and could lead to you being able to 
read but unable to decode a string sent to you. 

> Python library + compiler does not support unicode strings
> ----------------------------------------------------------
>
>                 Key: THRIFT-395
>                 URL: https://issues.apache.org/jira/browse/THRIFT-395
>             Project: Thrift
>          Issue Type: Improvement
>          Components: Compiler (Python), Library (Python)
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.1
>
>         Attachments: 
> 0001-python-Minor-cleanup-of-protocols-don-t-use-str.patch, 
> 0002-THRIFT-395.-python-Phase-One-of-support-for-unicode.patch, 
> 0003-THRIFT-395.-python-Phase-Two-of-support-for-unicode.patch, 
> 0004-python-Remove-ridiculous-semicolons-from-gen-code.patch, 
> python-utf8-v2.patch, python-utf8.patch
>
>
> Effectively, all strings in the python bindings are treated as binary strings 
> -- no encoding/decoding to UTF-8 is done.  So if a unicode object is passed 
> to a (regular, non-binary) string, an exception is raised.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (THRIFT-395) Python library + compiler does not support unicode strings

Reply via email to