On Tue, 14 Feb 2006 19:41:07 -0500, "Raymond Hettinger" <[EMAIL PROTECTED]> wrote:
>[Guido van Rossum] >> Somewhat controversial: >> >> - bytes("abc") == bytes(map(ord, "abc")) > >At first glance, this seems obvious and necessary, so if it's somewhat >controversial, then I'm missing something. What's the issue? > ord("x") gets the source encoding's ord value of "x", but if that is not unicode or latin-1, it will break when PY 3000 makes "x" unicode. This means until Py 3000 plain str string literals have to use ascii and escapes in order to preserve the meaning when "x" == u"x". But the good news is bytes(map(ord(u"x"))) works fine for any source encoding now or after PY 3000. You just have to type characters into your editor between the quotes that look on the screen like any of the first 256 unicode characters (or use ascii escapes for unshowables). The u"x" translates x into unicode according to the *character* of x, whatever the source encoding, so all you have to do is choose characters of the first 256 unicodes. This happens to be latin-1, but you can ignore that unless you are interested in the actual byte values. If they have byte meaning, escapes are clearer anyway, and they work in a unicode string (where "x".decode(source_encoding) might fail on an illegal character). The solution is to use u"x" for now or use ascii-only with escapes, and just map ord on either kind of string. This should work when u"x" becomes equivalent to "x". The unicode that comes from a current u"x" string defines a *character* sequence. If you use legal latin-1 *characters* in whatever source encoding your editor and coding cookie say, you will get the *characters* you see inside the quotes in the u"..." literal translated to unicode, and the first 256 characters of unicode happen to be the latin-1 set, so map ord just works. With a unicode string you don't have to think about encoding, just use ord/unichr in range(0,256). Hex escapes within unicode strings work as expected, so IMO it's pretty clean. I think I have shown this in a couple of other posts in the orignal thread (where I created and compiled source code in several encodings including utf-8 and comiled with coding cookies and exec'd the result) I could always have overlooked something, but I am hopeful. Regards, Bengt Richter _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com