[issue15276] unicode format does not really work in Python 2.x

2020-05-31 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Python 2.7 is no longer supported.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2020-05-31 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
resolution:  -> out of date
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2014-04-17 Thread Petr Dlouhý

Petr Dlouhý added the comment:

For anyone stuck on Python 2.x, here is an workaround (maybe it could find it's 
way to documentation also):

  def fix_grouping(bytestring):
  try:
  return unicode(bytestring)
  except UnicodeDecodeError:
  return bytestring.decode("utf-8")

--
nosy: +petr.dlo...@email.cz

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-11-04 Thread STINNER Victor

STINNER Victor added the comment:

"If we don't fix this (I'm leaning that way myself), I think we should somehow 
document the limitation.  There are ways to acknowledge the limitation without 
getting into the specifics of this particular issue."

I agree to documentation the limitation and close this issue as "wontfix".

A workaround is to format as a bytes string, and then decode the result from 
the locale encoding. It looks like locale.getpreferredencoding(True) should be 
used, not locale.getpreferredencoding(False).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-11-04 Thread Berker Peksag

Changes by Berker Peksag :


--
nosy:  -berker.peksag

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-09-22 Thread Chris Jerdonek

Chris Jerdonek added the comment:

I have a brief documentation patch in mind for this, but it relies on 
documentation issue 15952 being addressed first (e.g. to say that format(value) 
returns Unicode when format_spec is Unicode and that value.__format__() can 
return a string of type str).  So I'm marking issue 15952 as a dependency.

--
dependencies: +format(value) and value.__format__() behave differently with 
unicode format

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-09-20 Thread STINNER Victor

STINNER Victor added the comment:

> I can't reproduce this with Python 2.7.3.
> >>> locale.setlocale(locale.LC_NUMERIC, 'fr_FR')
> 'fr_FR'
> >>> u'{:n}'.format(1)
> u'10 000'

I don't understand why, but the all french locales are the same. Some "french 
locale" uses the standard ASCII space (U+0020) as thousand seperator, others 
use the non-breaking space (U+00A0). I suppose that some systems prefer to 
avoid non-ASCII characters to avoid "Unicode issues".

On Ubuntu 12.04, locale.localeconv()['thousands_sep'] is chr(32) for the locale 
fr_FR.utf8.

You may need to install other locales to test this issue. For example, the 
ps_AF locale uses U+066b as the decimal point and the thousands separator.

I chose to not fix the issue in Python 3.2 because it needs to change too much 
code (and I don't want to introduce a regression and 3.2 code is very different 
than 3.3). You should upgrade to Python 3.3, or reimplement the Unicode 
format() function for numbers using locale.localeconv() ('thousands_sep', 
'decimal_point' and 'grouping') :-/

Or find a more motivated developer. Or I can do the job if you pay me ;-)

(Read also the issue #13706 for more information.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-09-20 Thread STINNER Victor

STINNER Victor added the comment:

I fixed a similar bug in Python 3.3: issue #13706.

changeset:   75231:f89e2f4cda88
user:Victor Stinner 
date:Fri Feb 24 00:37:51 2012 +0100
files:   Include/unicodeobject.h Lib/test/test_format.py 
Objects/stringlib/asciilib.h Objects/stringlib/localeutil.h 
Objects/stringlib/stringdefs.h Objects/stringlib/ucs1lib.h 
description:
Issue #13706: Fix format(int, "n") for locale with non-ASCII thousands separator

 * Decode thousands separator and decimal point using PyUnicode_DecodeLocale()
   (from the locale encoding), instead of decoding them implicitly from latin1
 * Remove _PyUnicode_InsertThousandsGroupingLocale(), it was not used
 * Change _PyUnicode_InsertThousandsGrouping() API to return the maximum
   character if unicode is NULL
 * Replace MIN/MAX macros by Py_MIN/Py_MAX
 * stringlib/undef.h undefines STRINGLIB_IS_UNICODE
 * stringlib/localeutil.h only supports Unicode

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-09-20 Thread STINNER Victor

Changes by STINNER Victor :


--
nosy: +haypo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-09-19 Thread Chris Jerdonek

Chris Jerdonek added the comment:

If we don't fix this (I'm leaning that way myself), I think we should somehow 
document the limitation.  There are ways to acknowledge the limitation without 
getting into the specifics of this particular issue.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-09-19 Thread Martin v . Löwis

Martin v. Löwis added the comment:

> What do you think?

[Even though I wasn't asked]

I think we may need to close the issue as "won't fix". Depending on the
exact change propsosed, it may be that the return type for existing
operations might change, which shouldn't be done in a bug fix release.

People running into this issue should port to Python 3 (IMO).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-09-16 Thread Chris Jerdonek

Chris Jerdonek added the comment:

Eric, it looks like you wrote this comment:

/* don't define FORMAT_LONG, FORMAT_FLOAT, and FORMAT_COMPLEX, since
   we can live with only the string versions of those.  The builtin
   format() will convert them to unicode. */

in http://hg.python.org/cpython/file/19601d451d4c/Python/formatter_unicode.c

It seems like the current issue may be a valid reason for introducing a unicode 
FORMAT_INT (i.e. not just for type-purity and PEP 3101 compliance, but to avoid 
an exception).  What do you think?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-09-16 Thread Chris Jerdonek

Chris Jerdonek added the comment:

I did some analysis of this issue.

For starters, I could not reproduce this on Mac OS X 10.7.4.  I iterated 
through all available locales, and the separator was ASCII in all cases.

Instead, I was able to fake the issue by changing "," to "\xa0" in the 
following line--

http://hg.python.org/cpython/file/820032281f49/Objects/stringlib/formatter.h#l651

and then reproduce with:

>>> u'{:,}'.format(1)
  ..
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 2: ordinal 
not in range(128)
>>> format(1, u',')
  ..
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 2: ordinal 
not in range(128)

However, note this difference (see also issue 15952)--

>>> (1).__format__(u',')
'10\xa'

The issue seems to be that PyObject_Format() in Objects/abstract.c (which, 
unlike int__format__() in Objects/intobject.c, does respect whether the format 
string is unicode or not) calls int__format__() to get the formatted string as 
a byte string.  It then passes this to PyObject_Unicode() to convert to 
unicode.  This in turn calls PyUnicode_FromEncodedObject() with a NULL 
encoding, which causes that code to use PyUnicode_GetDefaultEncoding() for the 
encoding (i.e. sys.getdefaultencoding()).

The right way to fix this seems to be to make int__format__() return unicode as 
appropriate, which may mean modifying formatter.h's 
format_int_or_long_internal() to return unicode -- as well as taking into 
account the locale encoding when accessing the locale's thousands separator.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-09-16 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis :


--
nosy: +Arfrever

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-09-16 Thread Chris Jerdonek

Chris Jerdonek added the comment:

> The case with 1.__format__ is confusing the parser.

Interesting, good catch!  That error did seem unusual.  The two modified forms 
do give the same result as int.__format__() (though the type still differs).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-09-16 Thread Eric V. Smith

Eric V. Smith added the comment:

The case with 1.__format__ is confusing the parser. It sees:
 __format__
which is indeed a syntax error.

Try:
>>> 1 .__format__(u'n')
'1'

or:
>>> (1).__format__(u'n')
'1'

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-09-16 Thread Chris Jerdonek

Chris Jerdonek added the comment:

I can't yet reproduce on my system, but after looking at the code, I believe 
the following are closer to the cause:

>>> format(1, u'n')
>>> int.__format__(1, u'n')

Incidentally, on my system, the following note in the docs is wrong:

"Note: format(value, format_spec) merely calls value.__format__(format_spec)."

(from http://docs.python.org/library/functions.html?#format )

>>> format(1, u'n')
u'1'
>>> 1.__format__(u'n')
  File "", line 1
1.__format__(u'n')
   ^
SyntaxError: invalid syntax
>>> int.__format__(1, u'n')
'1'

Observe also that format() and int.__format__() return different types.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-07-08 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

I confirm the bug on 2.7.

$ ./python 
Python 2.7.3+ (2.7:ab9d6c4907e7+, Apr 25 2012, 20:02:36) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_NUMERIC, 'uk_UA.UTF-8')
'uk_UA.UTF-8'
>>> u'{:n}'.format(1)
Traceback (most recent call last):
  File "", line 1, in 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 2: ordinal 
not in range(128)
>>> '{:n}'.format(1)
'10\xc2\xa'

--
components: +Interpreter Core, Unicode
nosy: +ezio.melotti, storchaka
type:  -> behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-07-08 Thread Berker Peksag

Berker Peksag  added the comment:

I can't reproduce this with Python 2.7.3.

berker@wakefield ~[master*]$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_NUMERIC, 'fr_FR')
'fr_FR'
>>> u'{:n}'.format(1)
u'10 000'

--
nosy: +berker.peksag

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-07-07 Thread Antoine Pitrou

Changes by Antoine Pitrou :


--
nosy: +eric.smith

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-07-07 Thread Ariel Ben-Yehuda

Ariel Ben-Yehuda  added the comment:

I don't work on CPython

On Sat, Jul 7, 2012 at 6:57 PM, Martin v. Löwis wrote:

>
> Martin v. Löwis  added the comment:
>
> Ariel: would you like to provide a patch?
>
> --
> nosy: +loewis
>
> ___
> Python tracker 
> 
> ___
>

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-07-07 Thread Martin v . Löwis

Martin v. Löwis  added the comment:

Ariel: would you like to provide a patch?

--
nosy: +loewis

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-07-07 Thread Chris Jerdonek

Chris Jerdonek  added the comment:

Cf. the related issue 7300: "Unicode arguments in str.format()".

--
nosy: +cjerdonek

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15276] unicode format does not really work in Python 2.x

2012-07-07 Thread Ariel Ben-Yehuda

New submission from Ariel Ben-Yehuda :

unicode formats (u'{:n}'.format) in python 2.x assume that the thousands 
seperator is in ascii, so this fails:

>>> import locale
>>> locale.setlocale(locale.LC_NUMERIC, 'fra') # or fr_FR on UNIX
>>> u'{:n}'.format(1)
Traceback (most recent call last):
  File "", line 1, in 
u'{:n}'.format(1)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 2: ordinal 
not in range(128)

However, it works correctly in python 3, properly returning '10\xA0' (the 
\xA0 is a nbsp)

--
messages: 164844
nosy: Ariel.Ben-Yehuda
priority: normal
severity: normal
status: open
title: unicode format does not really work in Python 2.x
versions: Python 2.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com