MRAB wrote:

When working with Unicode in Python 2, you should use the 'unicode' type
for text (Unicode strings) and limit the 'str' type to binary data
(bytestrings, ie bytes) only.

Well OK, always use u'something', that's simple -- but isn't str what I get from files and sockets and the like?

In Python 3 they've been renamed to 'str' for Unicode _strings_ and
'bytes' for binary data (bytes!).

Neat, except that the process of porting most projects and external libraries to P3 seems to be, how should I put it, standing still? Or am I wrong? But that's the impression I get?

Take web frameworks for example. Does any of them have serious plans and work in place to port to P3?

Strictly speaking, only Unicode can be encoded.

How so? Can't bytestrings containing characters of, say, koi8r encoding be encoded?

What Python 2 is doing here is trying to be helpful: if it's already a
bytestring then decode it first to Unicode and then re-encode it to a
bytestring.

It's really cumbersome sometimes, even if two libraries are written by one author: for instance, Mako and SQLAlchemy are written by the same guy. They are both top-of-the line in my humble opinion, but when you connect them you get things like this:

1. you query SQLAlchemy object, that happens to have string fields in relational DB.

2. Corresponding Python attributes of those objects then have type str, not unicode.

3. then I pass those objects to Mako for HTML rendering.

Typically, it works: but if and only if a character in there does not happen to be out of ASCII range. If it does, you get UnicodeDecodeError on an unsuspecting user.

Sure, I wrote myself a helper that iterates over keyword dictionary to make sure to convert all str to unicode and only then passes the dictionary to render_unicode. It's an overhead, though. It would be nicer to have it all unicode from db and then just pass it for rendering and having it working. (unless there's something in filters that I missed, but there's encoding of templates, tags, but I didn't find anything on automatic conversion of objects passed to method rendering template)

But maybe I'm whining.


Unfortunately, the default encoding is ASCII, and the bytestring isn't
valid ASCII. Python 2 is being 'helpful' in a bad way!

And the default encoding is coded in such way so it cannot be changed in sitecustomize (without code modification, that is).

Regards,
mk

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to