Andrew Svetlov <[email protected]> added the comment:
I consulted with Martin at PyCon sprint and he suggested sulution which I'm
following — to split `print` and REPL (read-eval-print loop).
Output passed to print() function encoded with sys.stdout.encoding
UTF has been invented to support any character.
Linux usually setted up to use utf-8 encoding by default (see LANG environment
variable). There are no encoding issues with that.
xterm (old enough terminal) which you use cannot print non-BMP characters and
replaces it with question marks.
Modern gnome-terminal prints that symbols very well.
Let's return to non-UTF terminal encodings.
If character cannot be encoded Python throws UnicodeEncodeError.
There's example:
andrew@tiktaalik ~/p/cpython> bash -c "LANG=C; ./python"
Python 3.3.0a1+ (qbase qtip tip tk:c3ce8a8e6c9c+, Mar 14 2012, 15:54:55)
[GCC 4.6.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> '\U00010340'
'\U00010340'
>>> print('\U00010340')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\U00010340' in
position 0: ordinal not in range(128)
>>>
As you can see I have switched LANG to C (alias for ASCII) locale.
Eval printed with unicode escaping but `print` call raises error.
This happens because python's REPL calls sys.displayhook.
You can look at http://docs.python.org/dev/library/sys.html#sys.displayhook
details.
That code escapes unicode if terminal doesn't support it.
The same for Windows, OS X and any other platform.
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue14200>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com