Ned Deily <n...@acm.org> added the comment:

The character in question is not the problem and the code snippet you provide 
looks fine.  The problem is almost certainly that you are running the code in 
an execution environment where the LANG environment variable is either not set 
or is set to an encoding that doesn't support higher-order Unicode characters. 
The fallback 'mac_roman' is such an encoding.  The default encodings used by 
the Python 3 interpreter are influenced by the value of these environment 
variables.  So the questions are: how are you running your code and what are 
the values of the environment variables that your Python program inherits, and, 
by any chance, is your program using the 'locale' module, and if so, exactly 
what functions from it?

Please try adding the following in the environment you are seeing the problem:

import sys
print(sys.stdout)
import os
print([(k, os.environ[k]) for k in os.environ if k.startswith('LC')])
print([(k, os.environ[k]) for k in os.environ if k.startswith('LANG')])
import locale
print(locale.getlocale())
print('\u00e5')
print('\u0061\u030a')

If I paste the above into a Python3.2 interactive terminal session using the 
python.org 64-/32-bit Python 3.2.3, I see the following:

$ python3.2
Python 3.2.3 (v3.2.3:3d0686d90f55, Apr 10 2012, 11:25:50) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print(sys.stdout)
<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
>>> import os
>>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LC')])
[]
>>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LANG')])
[('LANG', 'en_US.UTF-8')]
>>> import locale
>>> print(locale.getlocale())
('en_US', 'UTF-8')
>>> print('\u00e5')
å
>>> print('\u0061\u030a')
å

But, if I explicitly remove the LANG environment variable:

$ unset LANG
$ python3.2
Python 3.2.3 (v3.2.3:3d0686d90f55, Apr 10 2012, 11:25:50) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print(sys.stdout)
<_io.TextIOWrapper name='<stdout>' mode='w' encoding='US-ASCII'>
>>> import os
>>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LC')])
[]
>>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LANG')])
[]
>>> import locale
>>> print(locale.getlocale())
(None, None)
>>> print('\u00e5')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\xe5' in position 0: 
ordinal not in range(128)
>>> print('\u0061\u030a')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\u030a' in position 
1: ordinal not in range(128)
>>>

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue14986>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to