Chris Jerdonek added the comment:

I did some analysis of this issue.

For starters, I could not reproduce this on Mac OS X 10.7.4.  I iterated 
through all available locales, and the separator was ASCII in all cases.

Instead, I was able to fake the issue by changing "," to "\xa0" in the 
following line--

http://hg.python.org/cpython/file/820032281f49/Objects/stringlib/formatter.h#l651

and then reproduce with:

>>> u'{:,}'.format(10000)
  ..
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 2: ordinal 
not in range(128)
>>> format(10000, u',')
  ..
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 2: ordinal 
not in range(128)

However, note this difference (see also issue 15952)--

>>> (10000).__format__(u',')
'10\xa0000'

The issue seems to be that PyObject_Format() in Objects/abstract.c (which, 
unlike int__format__() in Objects/intobject.c, does respect whether the format 
string is unicode or not) calls int__format__() to get the formatted string as 
a byte string.  It then passes this to PyObject_Unicode() to convert to 
unicode.  This in turn calls PyUnicode_FromEncodedObject() with a NULL 
encoding, which causes that code to use PyUnicode_GetDefaultEncoding() for the 
encoding (i.e. sys.getdefaultencoding()).

The right way to fix this seems to be to make int__format__() return unicode as 
appropriate, which may mean modifying formatter.h's 
format_int_or_long_internal() to return unicode -- as well as taking into 
account the locale encoding when accessing the locale's thousands separator.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue15276>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to