Hi Michael and Kent, thanks to your tips I was able to solve my problems! It was quite easy at last.
For those interested and struggling with utf-8, ascii and unicode:
After knowing the right way of
- string.decode() upon input (if in question)
- string.encode() upon output (more often then not)
where input and output are reading and writing to files, file-like
objects, databases... and functions of some not unicode-proof modules
I got rid of all calls to encode() and decode() I made by trial and error
and which messed it all up. Now I have just a few calls to encode() and
voilá! xml.sax seems to read and decode the utf-8 encoded xml-file
perfectly right, so do ZipFile.read() and file.write() - no encding oder
decoding.
To me it was very important to stress out that utf-8 ist *not* unicode,
although I have already read about this topic (and you can read this advise
often here at this list).
On my system sys.stdout and sys.stderr seem to have a utf-8 and a None
encoding, respectively (Kubuntu Linux, python2.4, ipython and konsole as
terminal).
The wrapper suggested by Kent
sys.stdout = codecs.getwriter('utf-8')(sys.stdout, 'backslashreplace')
sys.stderror = codecs.getwriter('ascii')(sys.stderror, 'backslashreplace')
solves all my output problems regarding debugging.
Thank you for your help!
Dave
P.s.: The quotations in my signature are by chance, really. Normally I'm not
the kind of guy believing in prevision... ;)
--
I never realized it before, but having looked that over I'm certain I'd
rather
have my eyes burned out by zombies with flaming dung sticks than work on a
conscientious Unicode regex engine.
-- Tim Peters, 3 Dec 1998
pgpwHHJ0xtmzY.pgp
Description: PGP signature
_______________________________________________ Tutor maillist - [email protected] http://mail.python.org/mailman/listinfo/tutor
