Hmm, if cStringIO does choke on certain Unicode inputs, then we may want to 
change this for complete correctness. I believe the choice of cStringIO was 
made for performance reasons. If we do make a change we should take care to 
ensure it doesn't introduce a significant regression in serialization 
performance.

My personal experience dealing with Python string-handling has created a lot of 
headaches. The distinction between the primitive unicode vs. string types is 
subtle but can cause a lot of weird foibles like this.

Anyone know if there's a JIRA issue open on this yet? Might be worth bumping 
over to the -dev list to have some Python experts weight in.

-----Original Message-----
From: Ted Dunning [mailto:[email protected]] 
Sent: Wednesday, January 21, 2009 11:00 AM
To: [email protected]
Subject: Re: UTF-8 with thrift python

I have moved fair bit of unicode data across thrift without problems.

Python has historical issues with character/byte issues that it inherited from 
C.  From what I hear, this is being aggressively addressed in P3, but moving to 
3 will be a sllooow process as all major changes are.

On Wed, Jan 21, 2009 at 2:32 AM, Emil Kirichev <[email protected]> wrote:

> So my question is, does thrift really supports utf-8 (like the wiki 
> says), that means all chars that can be represented, not just the 
> ascii subset, or I am I missing something? Any user with that kind of 
> a problem? I did not find anything on the subject on the internet, may 
> be other languages (java, php) does not have that problem?
>



--
Ted Dunning, CTO
DeepDyve
4600 Bohannon Drive, Suite 220
Menlo Park, CA 94025
www.deepdyve.com
650-324-0110, ext. 738
858-414-0013 (m)

Reply via email to