MRAB wrote:
When working with Unicode in Python 2, you should use the 'unicode' type
for text (Unicode strings) and limit the 'str' type to binary data
(bytestrings, ie bytes) only.
Well OK, always use u'something', that's simple -- but isn't str what I
get from files and sockets and the like?
In Python 3 they've been renamed to 'str' for Unicode _strings_ and
'bytes' for binary data (bytes!).
Neat, except that the process of porting most projects and external
libraries to P3 seems to be, how should I put it, standing still? Or am
I wrong? But that's the impression I get?
Take web frameworks for example. Does any of them have serious plans and
work in place to port to P3?
Strictly speaking, only Unicode can be encoded.
How so? Can't bytestrings containing characters of, say, koi8r encoding
be encoded?
What Python 2 is doing here is trying to be helpful: if it's already a
bytestring then decode it first to Unicode and then re-encode it to a
bytestring.
It's really cumbersome sometimes, even if two libraries are written by
one author: for instance, Mako and SQLAlchemy are written by the same
guy. They are both top-of-the line in my humble opinion, but when you
connect them you get things like this:
1. you query SQLAlchemy object, that happens to have string fields in
relational DB.
2. Corresponding Python attributes of those objects then have type str,
not unicode.
3. then I pass those objects to Mako for HTML rendering.
Typically, it works: but if and only if a character in there does not
happen to be out of ASCII range. If it does, you get UnicodeDecodeError
on an unsuspecting user.
Sure, I wrote myself a helper that iterates over keyword dictionary to
make sure to convert all str to unicode and only then passes the
dictionary to render_unicode. It's an overhead, though. It would be
nicer to have it all unicode from db and then just pass it for rendering
and having it working. (unless there's something in filters that I
missed, but there's encoding of templates, tags, but I didn't find
anything on automatic conversion of objects passed to method rendering
template)
But maybe I'm whining.
Unfortunately, the default encoding is ASCII, and the bytestring isn't
valid ASCII. Python 2 is being 'helpful' in a bad way!
And the default encoding is coded in such way so it cannot be changed in
sitecustomize (without code modification, that is).
Regards,
mk
--
http://mail.python.org/mailman/listinfo/python-list