[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-04 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: See the discussion on python-dev, in particular Martin's comment at http://mail.python.org/pipermail/python-dev/2009-December/094412.html The solutions to this seem too complex for 2.x. It is not a problem in 3.x. -- resolution: - wont

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-03 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Reassigning to Eric. -- assignee: mark.dickinson - eric.smith ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7327 ___

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-03 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: I've raised the issue with unicode and locale on python-dev: http://mail.python.org/pipermail/python-dev/2009-December/094408.html Pending the outcome of that decision, I'll move forward on this issue. --

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-02 Thread Stefan Krah
Stefan Krah stefan-use...@bytereef.org added the comment: In python3.2, the output of decimal looks good. With float, the separator is printed as two spaces on my Unicode terminal (export LC_ALL=cs_CZ.UTF-8). So decimal (3.2) interprets the separator string as a single UTF-8 char and the final

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-02 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: So when the format string has type 'str' (as in Stefan's original example) rather than type 'unicode', I'd say Python is doing the right thing already: everything in sight, including the separators coming from localeconv(), has type 'str',

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-02 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: I don't see any documentation that a struct lconv should be interpreted as UTF-8. In fact Googling struct lconv utf-8 gives this bug report as the first hit. lconv.thousands_sep is char*. It's never been clear to me if this means pointer to a

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-02 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: In trunk, Modules/_localemodule.c also treats these as string of char, so at least we're consistent. In py3k, mbstowcs is used and the result passed to PyUnicode_FromWideChar. I'm not sure how you'd address this in locale in trunk, or if we want

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-02 Thread Stefan Krah
Stefan Krah stefan-use...@bytereef.org added the comment: Googling multi-byte thousands separator gives better results. From those results, it is clear to me that decimal_point and thousands_sep are strings that may be interpreted as multi-byte characters. The Czech separator appears to be a

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-01 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: In python3: locale.setlocale(locale.LC_NUMERIC, cs_CZ.UTF-8) 'cs_CZ.UTF-8' s = format(Decimal(-1.5), ' 019.18n') len(s) 20 print(s) -0 000 000 000 001,5 Python3 uses unicode for strings. Python2 uses bytes. To format unicode in

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-01 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: In 2.7, I get: $ ./python.exe Python 2.7a0 (trunk:76501, Nov 24 2009, 14:57:21) [GCC 4.0.1 (Apple Inc. build 5465)] on darwin Type help, copyright, credits or license for more information. import locale locale.setlocale(locale.LC_NUMERIC,

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-01 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: Interesting. My regular locale is LC_CTYPE=en_US.UTF-8, and here is what I get: Python 2.7a0 (trunk:76501, Nov 24 2009, 13:59:01) [GCC 4.4.2] on linux2 Type help, copyright, credits or license for more information. import local import

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-12-01 Thread Eric Smith
Eric Smith e...@trueblade.com added the comment: I can duplicate this on Linux. The difference is the values in the locale for the separators, specifically, locale.localeconv()['thousands_sep']. locale.localeconv()['thousands_sep'] '\xc2\xa0' The question is: since a struct lconv contains

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-11-30 Thread Stefan Krah
Stefan Krah stefan-use...@bytereef.org added the comment: What you mean by working with bytestrings? The UTF-8 separators or decimal points come directly from struct lconv (man localeconv). The logical way to reach a minimum width of 19 is to have 19 UTF-8 characters, which can subsequently be

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-11-28 Thread Mark Dickinson
Changes by Mark Dickinson dicki...@gmail.com: -- assignee: - mark.dickinson ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7327 ___ ___

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-11-28 Thread Matthew Barnett
Matthew Barnett pyt...@mrabarnett.plus.com added the comment: Surely this is to be expected when working with bytestrings. You should be working in Unicode and using UTF-8 only for input and output. -- nosy: +mrabarnett ___ Python tracker

[issue7327] format: minimum width: UTF-8 separators and decimal points

2009-11-15 Thread Stefan Krah
New submission from Stefan Krah stefan-use...@bytereef.org: This issue affects the format functions of float and decimal. When calculating the padding necessary to reach the minimum width, UTF-8 separators and decimal points are calculated by their byte lengths. This can lead to printed