Eric Smith e...@trueblade.com added the comment:
See the discussion on python-dev, in particular Martin's comment at
http://mail.python.org/pipermail/python-dev/2009-December/094412.html
The solutions to this seem too complex for 2.x. It is not a problem in 3.x.
--
resolution: - wont
Mark Dickinson dicki...@gmail.com added the comment:
Reassigning to Eric.
--
assignee: mark.dickinson - eric.smith
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7327
___
Eric Smith e...@trueblade.com added the comment:
I've raised the issue with unicode and locale on python-dev:
http://mail.python.org/pipermail/python-dev/2009-December/094408.html
Pending the outcome of that decision, I'll move forward on this issue.
--
Stefan Krah stefan-use...@bytereef.org added the comment:
In python3.2, the output of decimal looks good. With float, the
separator is printed as two spaces on my Unicode terminal (export
LC_ALL=cs_CZ.UTF-8).
So decimal (3.2) interprets the separator string as a single UTF-8 char
and the final
Mark Dickinson dicki...@gmail.com added the comment:
So when the format string has type 'str' (as in Stefan's original example)
rather than type 'unicode', I'd say Python is doing the right thing
already: everything in sight, including the separators coming from
localeconv(), has type 'str',
Eric Smith e...@trueblade.com added the comment:
I don't see any documentation that a struct lconv should be interpreted
as UTF-8. In fact Googling struct lconv utf-8 gives this bug report as
the first hit.
lconv.thousands_sep is char*. It's never been clear to me if this means
pointer to a
Eric Smith e...@trueblade.com added the comment:
In trunk, Modules/_localemodule.c also treats these as string of char,
so at least we're consistent.
In py3k, mbstowcs is used and the result passed to PyUnicode_FromWideChar.
I'm not sure how you'd address this in locale in trunk, or if we want
Stefan Krah stefan-use...@bytereef.org added the comment:
Googling multi-byte thousands separator gives better results. From
those results, it is clear to me that decimal_point and thousands_sep
are strings that may be interpreted as multi-byte characters. The Czech
separator appears to be a
R. David Murray rdmur...@bitdance.com added the comment:
In python3:
locale.setlocale(locale.LC_NUMERIC, cs_CZ.UTF-8)
'cs_CZ.UTF-8'
s = format(Decimal(-1.5), ' 019.18n')
len(s)
20
print(s)
-0 000 000 000 001,5
Python3 uses unicode for strings. Python2 uses bytes. To format
unicode in
Eric Smith e...@trueblade.com added the comment:
In 2.7, I get:
$ ./python.exe
Python 2.7a0 (trunk:76501, Nov 24 2009, 14:57:21)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type help, copyright, credits or license for more information.
import locale
locale.setlocale(locale.LC_NUMERIC,
R. David Murray rdmur...@bitdance.com added the comment:
Interesting. My regular locale is LC_CTYPE=en_US.UTF-8, and here is
what I get:
Python 2.7a0 (trunk:76501, Nov 24 2009, 13:59:01)
[GCC 4.4.2] on linux2
Type help, copyright, credits or license for more information.
import local
import
Eric Smith e...@trueblade.com added the comment:
I can duplicate this on Linux. The difference is the values in the
locale for the separators, specifically,
locale.localeconv()['thousands_sep'].
locale.localeconv()['thousands_sep']
'\xc2\xa0'
The question is: since a struct lconv contains
Stefan Krah stefan-use...@bytereef.org added the comment:
What you mean by working with bytestrings? The UTF-8 separators or
decimal points come directly from struct lconv (man localeconv). The
logical way to reach a minimum width of 19 is to have 19 UTF-8
characters, which can subsequently be
Changes by Mark Dickinson dicki...@gmail.com:
--
assignee: - mark.dickinson
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7327
___
___
Matthew Barnett pyt...@mrabarnett.plus.com added the comment:
Surely this is to be expected when working with bytestrings. You should
be working in Unicode and using UTF-8 only for input and output.
--
nosy: +mrabarnett
___
Python tracker
New submission from Stefan Krah stefan-use...@bytereef.org:
This issue affects the format functions of float and decimal.
When calculating the padding necessary to reach the minimum width,
UTF-8 separators and decimal points are calculated by their byte
lengths. This can lead to printed
16 matches
Mail list logo