Ned Deily <n...@acm.org> added the comment: The character in question is not the problem and the code snippet you provide looks fine. The problem is almost certainly that you are running the code in an execution environment where the LANG environment variable is either not set or is set to an encoding that doesn't support higher-order Unicode characters. The fallback 'mac_roman' is such an encoding. The default encodings used by the Python 3 interpreter are influenced by the value of these environment variables. So the questions are: how are you running your code and what are the values of the environment variables that your Python program inherits, and, by any chance, is your program using the 'locale' module, and if so, exactly what functions from it?
Please try adding the following in the environment you are seeing the problem: import sys print(sys.stdout) import os print([(k, os.environ[k]) for k in os.environ if k.startswith('LC')]) print([(k, os.environ[k]) for k in os.environ if k.startswith('LANG')]) import locale print(locale.getlocale()) print('\u00e5') print('\u0061\u030a') If I paste the above into a Python3.2 interactive terminal session using the python.org 64-/32-bit Python 3.2.3, I see the following: $ python3.2 Python 3.2.3 (v3.2.3:3d0686d90f55, Apr 10 2012, 11:25:50) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> print(sys.stdout) <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'> >>> import os >>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LC')]) [] >>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LANG')]) [('LANG', 'en_US.UTF-8')] >>> import locale >>> print(locale.getlocale()) ('en_US', 'UTF-8') >>> print('\u00e5') å >>> print('\u0061\u030a') å But, if I explicitly remove the LANG environment variable: $ unset LANG $ python3.2 Python 3.2.3 (v3.2.3:3d0686d90f55, Apr 10 2012, 11:25:50) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> print(sys.stdout) <_io.TextIOWrapper name='<stdout>' mode='w' encoding='US-ASCII'> >>> import os >>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LC')]) [] >>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LANG')]) [] >>> import locale >>> print(locale.getlocale()) (None, None) >>> print('\u00e5') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character '\xe5' in position 0: ordinal not in range(128) >>> print('\u0061\u030a') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character '\u030a' in position 1: ordinal not in range(128) >>> ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue14986> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com