[ 
https://issues.apache.org/jira/browse/THRIFT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694521#action_12694521
 ] 

Jonathan Ellis commented on THRIFT-395:
---------------------------------------

@Chad:

So there are two questions here:

1. is utf8 strings the right design decision, absent backwards-compatibility 
concerns

2. is it worth breaking back-compat for

I think some people are reluctant to admit 1. because they are afraid of 2.

I think the case for 2. can be made as follows:

a. as I have said several times (to no contradictions), simply recompiling the 
IDL after changing to binary is a virtually painless way to get back the old 
behavior of treating all data as binary no matter what it was declared as

b. nobody is forcing you to upgrade.  if svn 700000-whatever of thrift works 
for you, keep using it.  if necessary, backporting fixes to a branch is not an 
unheard-of strategy either.

c. you have to weigh the pain against current users from changing behavior, to 
pain for future users from NOT changing.  hopefully the current user base is a 
fraction of what it will be in a few years.  unicode is not going to get less 
popular in that time.

d. if you can't change broken behavior before there is an official release, 
when CAN you change it?

Now, perhaps it's worth the time to briefly explain my use case.

I work on the Cassandra distributed database, where we use thrift to let 
clients in any supported language talk to the Java server.  Keys are `string`s 
so they absolutely have to be compatible cross-platform.  Currently they are 
not.  Telling users that "thrift doesn't really support unicode, so in Python 
you have to set this flag, and in other languages it doesn't work at all and it 
will be an uphill battle to get a patch accepted" is a non-starter.  Cassandra 
has a high enough barrier to entry as it is, that adding to it unnecessarily is 
foolish.

Without real unicode support we'll have to switch to binary to get behavior 
that is at least consistent cross platform.

> Python library + compiler does not support unicode strings
> ----------------------------------------------------------
>
>                 Key: THRIFT-395
>                 URL: https://issues.apache.org/jira/browse/THRIFT-395
>             Project: Thrift
>          Issue Type: Bug
>          Components: Compiler (Python), Library (Python)
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Blocker
>             Fix For: 0.1
>
>         Attachments: 
> 0001-python-Minor-cleanup-of-protocols-don-t-use-str.patch, 
> 0002-THRIFT-395.-python-Phase-One-of-support-for-unicode.patch, 
> 0003-THRIFT-395.-python-Phase-Two-of-support-for-unicode.patch, 
> 0004-python-Remove-ridiculous-semicolons-from-gen-code.patch, 
> python-utf8-v2.patch, python-utf8.patch
>
>
> Effectively, all strings in the python bindings are treated as binary strings 
> -- no encoding/decoding to UTF-8 is done.  So if a unicode object is passed 
> to a (regular, non-binary) string, an exception is raised.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to