David Malcolm wrote:
> On Wed, 2010-01-20 at 22:37 +0100, M.-A. Lemburg wrote:
> Note that pango isn't even doing the module reload hack; it's written in
> C, and going in directly through the C API:
>    PyUnicode_SetDefaultEncoding("utf-8");
> 
> I should mention that I've seen at least one C module in the wild that
> exists merely to do this:
> 
>   #include <Python.h>
>   void initutf8_please(void) {
>      PyUnicode_SetDefaultEncoding("utf-8");
>   }
> 
> so that the user could do "import utf8_please" at the top of their
> scripts.

We should have made that a private C API... oh well. At the time
these APIs were written it wasn't yet clear which default encoding
to choose and even after the decision there were a few different camps:

 * Latin-1
 * UTF-8
 * locale dependent

Sometime later Guido (AFAIR) then proposed ASCII as the GCD of
all of these.

>> If all you want to achieve is getting the encodings of
>> stdout and stdin correctly setup for pipes, you should
>> instead change the .encoding attribute of those (only).
> Currently they are set up, but only when connected to a tty, which leads
> to surprising behavior changes inside pipes/cronjobs (e.g. piping a
> unicode string to "less" immediately breaks for code points above 127:
> less is expecting locale-encoded bytes, but sys.stdout has encoding
> "ASCII").
> 
> Similarly:
> [da...@brick ~]$ python -c "import sys; print sys.stdout.encoding"
> UTF-8
> [da...@brick ~]$ python -c "import sys; print sys.stdout.encoding" | cat
> None
> 
> Why only set an encoding on these streams when they're directly
> connected to a tty?  I'll patch things to remove the isatty conditional
> if that's acceptable.
> 
> (the tty-logic to do it appeared with the initial commit that added
> locale-encoding support to sys.std[in|out], in sysmodule.c:
> http://svn.python.org/view?view=rev&revision=32719
> and was later moved from sysmodule.c to pythonrun.c:
> http://svn.python.org/view?view=rev&revision=33817 
> it later grew to affect stderr:
> http://svn.python.org/view?view=rev&revision=43581
> again, only if directly connected to a tty)

For TTYs the process locale will be a reasonable source of
information about the encoding used for stdin and stdout
(since the TTY will use those settings as well).

For pipes the situation is not all that clear, e.g. you
could have a Java application creating some text in UTF-8
which then gets passed to another application Latin-1
and all that running in a CP1252 shell on Windows.

However, removing the isatty() check will certainly not cause
as many problems as changing the default encoding altogether.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 20 2010)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to