Re: Yet another unicode WTF

Gabriel Genellina Thu, 04 Jun 2009 19:10:03 -0700

En Thu, 04 Jun 2009 22:18:24 -0300, Ron Garret <[email protected]>escribió:

Python 2.6.2 on OS X 10.5.7:


[...@mickey:~]$ echo $LANG
en_US.UTF-8
[...@mickey:~]$ cat frob.py
#!/usr/bin/env python
print u'\u03BB'

[...@mickey:~]$ ./frob.py
ª
[...@mickey:~]$ ./frob.py > foo
Traceback (most recent call last):
  File "./frob.py", line 2, in <module>
    print u'\u03BB'
UnicodeEncodeError: 'ascii' codec can't encode character u'\u03bb' in
position 0: ordinal not in range(128)


(That's supposed to be a small greek lambda, but I'm using a
brain-damaged news reader that won't let me set the character encoding.
It shows up correctly in my terminal.)

According to what I thought I knew about unix (and I had fancied myself
a bit of an expert until just now) this is impossible.  Python is
obviously picking up a different default encoding when its output is
being piped to a file, but I always thought one of the fundamental
invariants of unix processes was that there's no way for a process to
know what's on the other end of its stdout.

It may be hard to know *who* is at the other end of the pipe, but it'seasy to know *what* kind of file it is.Lots of programs detect whether stdout is a tty or not (using isatty(3))and adapt their output accordingly; ls is one example.

Python knows the terminal encoding (or at least can make a good guess),but a file may use *any* encoding you want, completely unrelated to yourterminal settings. So when stdout is redirected, Python refuses to guessits encoding; see the PYTHONIOENCODING environment variable.


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Re: Yet another unicode WTF

Reply via email to