On Wed, 2010-01-20 at 22:37 +0100, M.-A. Lemburg wrote: > David Malcolm wrote: > > I'm thinking of making this downstream change to Fedora's site.py (and > > possibly in future RHEL releases) so that the default encoding > > automatically picks up the encoding from the locale: > > > > def setencoding(): > > """Set the string encoding used by the Unicode implementation. The > > default is 'ascii', but if you're willing to experiment, you can > > change this.""" > > encoding = "ascii" # Default value set by _PyUnicode_Init() > > - if 0: > > + if 1: > > # Enable to support locale aware default string encodings. > > import locale > > loc = locale.getdefaultlocale() > > if loc[1]: > > encoding = loc[1] > > if 0: > > # Enable to switch off string to Unicode coercion and implicit > > # Unicode to string conversion. > > encoding = "undefined" > > if encoding != "ascii": > > # On Non-Unicode builds this will raise an AttributeError... > > sys.setdefaultencoding(encoding) # Needs Python Unicode build ! > > > > I've written up extensive notes on the change and the history of the > > issue here: > > https://fedoraproject.org/wiki/Features/PythonEncodingUsesSystemLocale > > > > Please let me know if there are any errors on that page! > > > > The aim is to avoid strange behavior changes when running a script > > within a shell pipeline/cronjob as opposed to at a tty (and to capture > > some of the bizarre cornercases, for example, I found the behavior of > > the pango/pygtk modules particularly surprising). > > > > I mention it here as a "heads-up" about the change: > > - in case other distributions may want to do the same (or already do > > so, though in my very brief survey no-one else seemed to), and > > - in case doing so breaks things in a way I'm not expecting; can > > anyone see any flaws in my arguments? > > - in case other people find my notes on the issue useful > > > > Hope this is helpful; can anyone see any potential problems with this > > change? > > Yes: such a change is unsupported by Python. The code you are > changing should really have been removed many releases ago - > it was originally only intended to serve as basis for experimentation > on choosing the "right" default encoding. > > The only supported default encodings in Python are: > > Python 2.x: ASCII > Python 3.x: UTF-8 > > If you change these, you are on your own and strange things will > start to happen. The default encoding does not only affect > the translation between Python and the outside world, but also > all internal conversions between 8-bit strings and Unicode. > > Hacks like what's happening in the pango module (setting the > default encoding to 'utf-8' by reloading the site module in > order to get the sys.setdefaultencoding() API back) are just > downright wrong and will cause serious problems since Unicode > objects cache their default encoded representation.
Thanks for the feedback. Note that pango isn't even doing the module reload hack; it's written in C, and going in directly through the C API: PyUnicode_SetDefaultEncoding("utf-8"); I should mention that I've seen at least one C module in the wild that exists merely to do this: #include <Python.h> void initutf8_please(void) { PyUnicode_SetDefaultEncoding("utf-8"); } so that the user could do "import utf8_please" at the top of their scripts. > If all you want to achieve is getting the encodings of > stdout and stdin correctly setup for pipes, you should > instead change the .encoding attribute of those (only). Currently they are set up, but only when connected to a tty, which leads to surprising behavior changes inside pipes/cronjobs (e.g. piping a unicode string to "less" immediately breaks for code points above 127: less is expecting locale-encoded bytes, but sys.stdout has encoding "ASCII"). Similarly: [da...@brick ~]$ python -c "import sys; print sys.stdout.encoding" UTF-8 [da...@brick ~]$ python -c "import sys; print sys.stdout.encoding" | cat None Why only set an encoding on these streams when they're directly connected to a tty? I'll patch things to remove the isatty conditional if that's acceptable. (the tty-logic to do it appeared with the initial commit that added locale-encoding support to sys.std[in|out], in sysmodule.c: http://svn.python.org/view?view=rev&revision=32719 and was later moved from sysmodule.c to pythonrun.c: http://svn.python.org/view?view=rev&revision=33817 it later grew to affect stderr: http://svn.python.org/view?view=rev&revision=43581 again, only if directly connected to a tty) Dave _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com