Re: Unicode again ... default codec ...

Gabriel Genellina Wed, 21 Oct 2009 17:53:38 -0700

En Wed, 21 Oct 2009 06:24:55 -0300, Lele Gaifax <[email protected]>
escribió:

"Gabriel Genellina" <[email protected]> writes:

DON'T do that. Really. Changing the default encoding is a horrible,
horrible hack and causes a lot of problems.
...
More reasons:
http://tarekziade.wordpress.com/2008/01/08/syssetdefaultencoding-is-evil/
See also this recent thread in python-dev:
http://comments.gmane.org/gmane.comp.python.devel/106134


This is a problem that appears quite often, against which I have yet to
see a general workaround, or even a "safe pattern". I must confess that
most often I just give up and change the "if 0:" line in
sitecustomize.py to enable a reasonable default...

A week ago I met another incarnation of the problem that I finally
solved by reloading the sys module, a very ugly way, don't tell me, and
I really would like to know a better way of doing it.

The case is simple enough: a unit test started failing miserably, with a
really strange traceback, and a quick pdb session revealed that the
culprit was nosetest, when it prints out the name of the test, using
some variant of "print testfunc.__doc__": since the latter happened to
be a unicode string containing some accented letters, that piece of
nosetest's code raised an encoding error, that went untrapped...

I tried to understand the issue, until I found that I was inside a fresh
new virtualenv with python 2.6 and the sitecustomize wasn't even
there. So, even if my shell environ was UTF-8 (the system being a Ubuntu
Jaunty), within that virtualenv Python's stdout encoding was
'ascii'. Rightly so, nosetest failed to encode the accented letters to
that.


That seems to imply that in your "normal" environment you altered the
default encoding to utf-8 -- if so: don't do that!

I could just rephrase the test __doc__, or remove it, but to avoid
future noise I decided to go with the deprecated "reload(sys)" trick,
done as early as possible... damn, it's just a test suite after all!

Is there a "correct" way of dealing with this? What should nosetest
eventually do to initialize it's sys.output.encoding reflecting the
system's settings? And on the user side, how could I otherwise fix it (I
mean, without resorting to the reload())?


nosetest should do nothing special. You should configure the environment
so Python *knows* that your console understands utf-8. Once Python is
aware of the *real* encoding your console is using, sys.stdout.encoding
will be utf-8 automatically and your problem is solved. I don't know how
to do that within virtualenv, but the answer certainly does NOT involve
sys.setdefaultencoding()

On Windows, a "normal" console window on my system uses cp850:


D:\USERDATA\Gabriel>chcp
Tabla de códigos activa: 850

D:\USERDATA\Gabriel>python
Python 2.6.3 (r263rc1:75186, Oct  2 2009, 20:40:30) [MSC v.1500 32 bit
(Intel)]
on win32
Type "help", "copyright", "credits" or "license" for more information.
py> import sys
py> sys.getdefaultencoding()
'ascii'
py> sys.stdout.encoding
'cp850'
py> u = u"áñç"
py> print u
áñç
py> u
u'\xe1\xf1\xe7'
py> u.encode("cp850")
'\xa0\xa4\x87'
py> import unicodedata
py> unicodedata.name(u[0])
'LATIN SMALL LETTER A WITH ACUTE'

I opened another console, changed the code page to 1252 (the one used in
Windows applications; `chcp 1252`) and invoked Python again:

py> import sys
py> sys.getdefaultencoding()
'ascii'
py> sys.stdout.encoding
'cp1252'
py> u = u"áñç"
py> print u
áñç
py> u
u'\xe1\xf1\xe7'
py> u.encode("cp1252")
'\xe1\xf1\xe7'
py> import unicodedata
py> unicodedata.name(u[0])
'LATIN SMALL LETTER A WITH ACUTE'

As you can see, everything works fine without any need to change the
default encoding... Just make sure Python *knows* which encoding is being
used in the console on which it runs. On Ubuntu you may need to set the
LANG environment variable.

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode again ... default codec ...

Reply via email to