Just to add my 2 or 3 cents. Not all Strings in Python are Unicode.
Python has a StringType and a UnicodeType. If you want to get a unicode string you have to write u"my test" instead of "my test". But in principle: u"my test" = "my test".decode("utf-8"). <-- depends on source encoding As has already been suggested, you should not depend on the default encoding of the operating system, so better use sys.getdefaultencoding() and store the encoding for communication with the console etc. (also important for wxwindows in non-unicode mode) Additionaly take care of the source encoding of your files, by adding a specific header to them: See: http://www.python.org/peps/pep-0263.html Finally I recommed the following tutorial, as the reportlab guys really know their stuff. http://www.reportlab.com/i18n/python_unicode_tutorial.html Kind Regards Jan Ischebeck ------------------------------------------------------------------------------------ P3 GmbH - Ingenieurgesellschaft für Management und Organisation Jan Ischebeck Senior Consultant Nürtinger Straße 9 70794 Filderstadt-Bernhausen phone: +49 - (0)163 / 75 33 613 fax: +49 - (0)163 / 99 75 33 613 e-mail: [EMAIL PROTECTED] web: www.p3-gmbh.de -----Ursprüngliche Nachricht----- Von: [EMAIL PROTECTED] im Auftrag von Fuzzyman Gesendet: Do 02-Mrz-06 18:35 Cc: pythonce@python.org Betreff: Re: [PythonCE] Unicode default encoding Jeffrey Barish wrote: >>Luke Dunstan wrote: >> >> >>>----- Original Message ----- >>>From: "Jeffrey Barish" <[EMAIL PROTECTED]> >>>To: <pythonce@python.org> >>>Sent: Friday, February 24, 2006 11:03 AM >>>Subject: [PythonCE] Unicode default encoding >>> >>> >>> >>>>What is the correct way to set PythonCE's default Unicode encoding? My >>>>reading (Python in a Nutshell) indicates that I am supposed to make a >>>>change to site.py, but there doesn't seem to be a site.py in >>>>PythonCE. (The closest I came is a site.pyc in python23.zip.) Nutshell >>>>suggests that in desperation one could put the following at the start of >>>>the main script: >>>> >>>>import sys >>>>reload(sys) >>>>sys.setdefaultencoding('iso-8859-15') >>>>del sys.setdefaultencoding >>>> >>>>This code solved the problem I was having reading and processing text that >>>>contains Unicode characters, but I am uncomfortable leaving a desperation >>>>solution in place. >>>> >>>> >>>> >>>I don't think modifying site.py would be a good solution, because if you >>>upgrade or reinstall python then the script will be overwritten. If you >>>only want to run your program on your own system then a better solution is >>>to create a file sitecustomize.py in your Python\Lib directory containing >>>this: >>> >>>import sys >>>sys.setdefaultencoding('iso-8859-15') >>> >>>If you want to distribute your program to other people though, you can't >>>expect them to change their default encoding so it is better not to rely on >>>the default encoding at all. >>> >>> >>> >>> >>Yep, using unicode and explicitly encoding/decoding is a better approach. >> >>Fuzzyman >> >> > >Once again, I am forced to display my ignorance. Sorry guys. I really don't >know much about Unicode. The solution that Luke suggested (sitecustomize.py >in my Python\Lib directory) works fine for me, but I am concerned about the >suggestion from him and Fuzzyman that explicit encoding/decoding is a better >approach. What is explicit encoding/decoding? Can someone point me to a >good resource for learning how to deal with Unicode correctly? > > Unicode, and text encodings in general, is a bit of a learning curve. Once you get your head round it, Python makes it pretty straightforward. Simple rules : * In Python text *really* means a unicode string * Because ordinary strings are really just strings of bytes * If you know the encoding, decode it to turn it into encoding * When writing or printing, encode it to turn it back into bytes * If you don't know the encoding then you better pray that whatever it is is encoded in the system default. ;-) byte_string = open(filename).read() # read a file text = byte_string.decode('utf_8') # we know it is UTF8, so we decode to unicode # ....code that uses the text byte_string = text.encode('utf_8') # we encode it back to UTF8 open(filename, 'w').write(byte_string) # so we can write it back out Decoding turns a byte string into a unicode object. Encoding turns a unicode object into a byte string. If this still confuses you (which it probably does) then there are lots of good resources. I happen to like : http://www.pyzine.com/Issue008/Section_Articles/article_Encodings.html Which seems to be down at the moment. :-( All the best, Fuzzyman http://www.voidspace.org.uk/python/index.shtml _______________________________________________ PythonCE mailing list PythonCE@python.org http://mail.python.org/mailman/listinfo/pythonce _______________________________________________ PythonCE mailing list PythonCE@python.org http://mail.python.org/mailman/listinfo/pythonce