On Fri, 17 Oct 2008 11:32:36 -0600, Joe Strout wrote: > On Oct 17, 2008, at 11:24 AM, Marc 'BlackJack' Rintsch wrote: > >>> kw = 'генских' >>> >> What do you mean by "does not work"? And you are aware that the above >> snipped doesn't involve any unicode characters!? You have a byte >> string there -- type `str` not `unicode`. > > Just checking my understanding here -- are the following all true: > > 1. If you had prefixed that literal with a "u", then you'd have Unicode.
Yes. > 2. Exactly what Unicode you get would be dependent on Python properly > interpreting the bytes in the source file -- which you can make it do by > adding something like "-*- coding: utf-8 -*-" in a comment at the top of > the file. Yes, assuming the encoding on that comment matches the actual encoding of the file. > 3. Without the "u" prefix, you'll have some 8-bit string, whose > interpretation is... er... here's where I get a bit fuzzy. No interpretation at all, just the bunch of bytes that happen to be in the source file. > What if your source file is set to utf-8? Do you then have a proper > UTF-8 string, but the problem is that none of the standard Python > library methods know how to properly interpret UTF-8? Well, the decode method knows how to decode that bytes into a `unicode` object if you call it with 'utf-8' as argument. > 4. In Python 3.0, this silliness goes away, because all strings are > Unicode by default. Yes and no. The problem just shifts because at some point you get into similar troubles, just in the other direction. Data enters the program as bytes and must leave it as bytes again, so you have to deal with encodings at those points. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list