Steven D'Aprano wrote: >>>>> import sys >>>>> sys.getdefaultencoding() >> 'ascii' > > That's technically known as a "lie", since if it were *really* ASCII it > would refuse to deal with characters with the high-bit set. But it > doesn't, it treats them in an unpredictable and implementation-dependent > manner.
It's not a lie, it just doesn't control the unicode-to-bytes conversion when printing: $ python Python 2.7.6 (default, Mar 22 2014, 22:59:56) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.getdefaultencoding() 'ascii' >>> print u"äöü" äöü >>> str(u"äöü") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128) >>> reload(sys) <module 'sys' (built-in)> >>> sys.setdefaultencoding("latin1") >>> print u"äöü" äöü >>> str(u"äöü") '\xe4\xf6\xfc' >>> sys.setdefaultencoding("utf-8") >>> print u"äöü" äöü >>> str(u"äöü") '\xc3\xa4\xc3\xb6\xc3\xbc' You can enforce ascii-only printing: $ LANG=C python Python 2.7.6 (default, Mar 22 2014, 22:59:56) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> print unichr(228) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 0: ordinal not in range(128) To find out the encoding that is used: $ python -c 'import locale; print locale.getpreferredencoding()' UTF-8 $ LANG=C python -c 'import locale; print locale.getpreferredencoding()' ANSI_X3.4-1968 """ Help on function getpreferredencoding in module locale: getpreferredencoding(do_setlocale=True) Return the charset that the user is likely using, according to the system configuration. """ -- https://mail.python.org/mailman/listinfo/python-list