> Thrift's string and binary types are represented as "str" (8-bit strings),
> and you are expected to use "str" when populating your Thrift structures.
> From your email, I assume you are using "unicode" strings in Thrift 
> structures.
> The str type in Python (like cStringIO) is essentially binary blob data.
> If you want to use utf-8 encoded Unicode data in a string field from Python,
> you should manually encode your unicode object into utf-8 as a str.
> This situation is pretty weak, but it is representative of all Python 2 code
> that deals with Unicode strings.

Yes, i'm aware of that. That's how I first tried and failed -- encoded
my unicode objects to str (with utf8 encoding) and
cStringIO.StringIO.write (or getvalue) fails because it tries to
convert the string you give it with 'ascii' codec (and some of my
chars cannot be represented with 1 byte) internaly and raises Unicode
exception. cStringIO doc says:

"Unlike the memory files implemented by the StringIO module, those
provided by this module are not able to accept Unicode strings that
cannot be encoded as plain ASCII strings."

Of couse I can replace my 'special' symbols with html entities (that
will grow the string), but doing this on both sides when you need to
serialize/deserialize doesnt seem to be a proper solution to me.
Switching from cStringIO to the slower StringIO solves the
problem...partialy. StringIO.StringIO accepts either unicode or str
('ascii') objects which is fine, until socket.send is called with
unicode object. socket..send cannot send unicode objects, because you
cannot guarantee what the representation will be on the other side
(unicode may be rep in python via UCS2 or UCS4. default is UCS2). So,
it turns out that if we wanna use StringIO with unicode objects,
TSocket must encode them (to utf8?) before send.
The easiest (and slowest) solution is to not use buffered transport,
then your can work with str objects as they are directly sent to
socket which is ok with str objects (whatever encoding you like).
I do not know how would that be solved with Python 3, but for now,
those seem to be my options.

Reply via email to