Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-22 Thread rzed
Raymond Hettinger pyt...@rcn.com wrote in
news:e35271b9-7623-4845-bcb9-d8c33971f...@w24g2000prd.googlegroups.c
om: 

 If anyone here is interested, here is a proposal I posted on the
 python-ideas list.
 
 The idea is to make numbering formatting a little easier with the
 new format() builtin
 in Py2.6 and Py3.0: 
 http://docs.python.org/library/string.html#formatspec 
 
[...]
 Comments and suggestions are welcome but I draw the line at
 supporting Mayan numbering conventions ;-)

Is that inclusive or exclusive?

-- 
rzed
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-16 Thread Rhodri James
On Mon, 16 Mar 2009 02:36:43 -, MRAB goo...@mrabarnett.plus.com  
wrote:



The field name can be an integer or an identifier, so the locale could
be too, provided that you know where to look it up!

 financial = Locale(group_sep=,, grouping=[3])
 print(my number is {0:10n:{fin}}.format(1234567, fin=financial))

Then again, shouldn't that be:

 fin = Locale(group_sep=,, grouping=[3])
 print(my number is {0:{fin}}.format(1234567, fin=financial))


Except that loses you the format, since the locale itself is a collection
of parameters the format uses.  The locale knows how to do groupings, but
not whether to do them, nor what the field width should be.  Come to think
of it, it doesn't know whether to use the LC_NUMERIC grouping information
or the LC_MONETARY grouping information.  Hmm.

I can't believe I'm even suggesting this, but how about:

  print(my number is {fin.format(10d, {0}, True)}.format(1235467,  
fin=financial))


assuming the locale.format() method remains unchanged?  That's horrible,
and I'm pretty sure it can't be right, but I'm too tired to think of
anything more sensible right now.

--
Rhodri James *-* Wildebeeste Herder to the Masses
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-16 Thread MRAB

Rhodri James wrote:
On Mon, 16 Mar 2009 02:36:43 -, MRAB goo...@mrabarnett.plus.com 
wrote:



The field name can be an integer or an identifier, so the locale could
be too, provided that you know where to look it up!

 financial = Locale(group_sep=,, grouping=[3])
 print(my number is {0:10n:{fin}}.format(1234567, fin=financial))

Then again, shouldn't that be:

 fin = Locale(group_sep=,, grouping=[3])
 print(my number is {0:{fin}}.format(1234567, fin=financial))


Except that loses you the format, since the locale itself is a collection
of parameters the format uses.  The locale knows how to do groupings, but
not whether to do them, nor what the field width should be.  Come to think
of it, it doesn't know whether to use the LC_NUMERIC grouping information
or the LC_MONETARY grouping information.  Hmm.

I can't believe I'm even suggesting this, but how about:

  print(my number is {fin.format(10d, {0}, True)}.format(1235467, 
fin=financial))


assuming the locale.format() method remains unchanged?  That's horrible,
and I'm pretty sure it can't be right, but I'm too tired to think of
anything more sensible right now.


It should probably(?) be:

financial = Locale(group_sep=,, grouping=[3])
print(my number is {0:10n:fin}.format(1234567, fin=financial))

The format 10n says whether to use separators or a decimal point; the
locale fin says what the separator and the decimal point look like.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-16 Thread Rhodri James
On Mon, 16 Mar 2009 23:04:58 -, MRAB goo...@mrabarnett.plus.com  
wrote:



It should probably(?) be:

 financial = Locale(group_sep=,, grouping=[3])
 print(my number is {0:10n:fin}.format(1234567, fin=financial))

The format 10n says whether to use separators or a decimal point; the
locale fin says what the separator and the decimal point look like.


That works, and isn't an abomination on the face of the existing syntax.   
Excellent.


I'm rather presuming that the n presentation type does grouping.  I've  
only got Python 2.5 here, so I can't check it out (no str.format() method  
and %n isn't supported by % formatting).  If it does, an m type to  
do the same thing only with the LC_MONETARY group settings instead of the  
LC_NUMERIC ones would be a good idea.


This would be my preferred solution to Raymond's original  
comma-in-the-format-string proposal, by the way: add an m presentation  
type as above, and tell people to override the LC_MONETARY group settings  
in the global locale.  It's clear that it's a bodge, and weaning users  
onto local locales (!) wouldn't be so hard later on.


Anyway, time I stopped hypothesising about locales and started looking at  
the actual code-base, methinks.


--
Rhodri James *-* Wildebeeste Herder to the Masses
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-16 Thread MRAB

Rhodri James wrote:
On Mon, 16 Mar 2009 23:04:58 -, MRAB goo...@mrabarnett.plus.com 
wrote:



It should probably(?) be:

 financial = Locale(group_sep=,, grouping=[3])
 print(my number is {0:10n:fin}.format(1234567, fin=financial))

The format 10n says whether to use separators or a decimal point; the
locale fin says what the separator and the decimal point look like.


That works, and isn't an abomination on the face of the existing 
syntax.  Excellent.


I'm rather presuming that the n presentation type does grouping.  I've 
only got Python 2.5 here, so I can't check it out (no str.format() 
method and %n isn't supported by % formatting).  If it does, an m 
type to do the same thing only with the LC_MONETARY group settings 
instead of the LC_NUMERIC ones would be a good idea.


This would be my preferred solution to Raymond's original 
comma-in-the-format-string proposal, by the way: add an m presentation 
type as above, and tell people to override the LC_MONETARY group 
settings in the global locale.  It's clear that it's a bodge, and 
weaning users onto local locales (!) wouldn't be so hard later on.


Anyway, time I stopped hypothesising about locales and started looking 
at the actual code-base, methinks.



I'm not against putting a comma in the format to indicate that grouping
should be used just as a dot indicates that a decimal point should be
used. The locale would say what characters would be used for them.

I would prefer the format to have a fixed default so that if you don't
specify the locale the result is predictable.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-16 Thread Rhodri James
On Tue, 17 Mar 2009 01:47:32 -, MRAB goo...@mrabarnett.plus.com  
wrote:



I'm not against putting a comma in the format to indicate that grouping
should be used just as a dot indicates that a decimal point should be
used. The locale would say what characters would be used for them.

I would prefer the format to have a fixed default so that if you don't
specify the locale the result is predictable.


Shouldn't that be the global locale?

--
Rhodri James *-* Wildebeeste Herder to the Masses
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-16 Thread MRAB

Rhodri James wrote:
On Tue, 17 Mar 2009 01:47:32 -, MRAB goo...@mrabarnett.plus.com 
wrote:



I'm not against putting a comma in the format to indicate that grouping
should be used just as a dot indicates that a decimal point should be
used. The locale would say what characters would be used for them.

I would prefer the format to have a fixed default so that if you don't
specify the locale the result is predictable.


Shouldn't that be the global locale?


Other parts of the language, such as str.upper, aren't locale-sensitive,
so I think that format shouldn't be either. If you want it to be
locale-sensitive, then specify the locale, even if it's the system
locale.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-16 Thread Rhodri James
On Tue, 17 Mar 2009 02:41:23 -, MRAB goo...@mrabarnett.plus.com  
wrote:



Rhodri James wrote:
On Tue, 17 Mar 2009 01:47:32 -, MRAB goo...@mrabarnett.plus.com  
wrote:



I'm not against putting a comma in the format to indicate that grouping
should be used just as a dot indicates that a decimal point should be
used. The locale would say what characters would be used for them.

I would prefer the format to have a fixed default so that if you don't
specify the locale the result is predictable.

 Shouldn't that be the global locale?


Other parts of the language, such as str.upper, aren't locale-sensitive,
so I think that format shouldn't be either. If you want it to be
locale-sensitive, then specify the locale, even if it's the system
locale.


Yes, but the format type 'n' is currently defined as taking its cues
from the global locale, so in that sense format already is
locale-sensitive.

--
Rhodri James *-* Wildebeeste Herder to the Masses
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-15 Thread Hendrik van Rooyen
Tim Rowe digil.com wrote:

8 -

 . If Finance users and non-professional
 programmers find the locale approach to be frustrating, arcane and
 non-obvious then by all means propose a way of making it simpler and
 clearer, but not a bodge that will increase the amount of bad software
 in the world.

I do not follow the reasoning behind this.

It seems to be based on an assumption that the locale approach
is some sort of holy grail that solves these problems, and that
anybody who does not like or use it is automatically guilty of
writing crap code.

No account seems to be taken of the fact that the locale approach
is a global one that forces uniformity on everything done on a PC
or by a user.

So when you want to make a report in a format that would suit
what your foreign visitors are used to, do you have to change
your server's locale, and change it back again afterwards, or what ?

The locale approach has all the disadvantages of global variables.

To make software usable by, or expandable to, different languages
and cultures is a tricky design problem - you have to, at the 
minimum, do things like storing all your text, both for prompts and
errors, in some kind of database and refer to it by its key, everywhere.
You cannot simply assume, that because a number represents
a monetary value, that it is Yen, or Australian Dollar, or whatever -
you may have to convert it first, from its currency, to the currency
that you want to display it as, and only then can you worry about
the format that you want to display it in.

In all of this, as I see it, the locale approach addresses only a small
part, and solves very little.

Why is it still being defended and touted as if it were 42?  *

- Hendrik

* the answer to life, the universe, and everything. 
( - Douglas Adams' Hitchhiker books)


--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-15 Thread Hendrik van Rooyen
Paul Rubin http://phr...@nospam.invalid wrote:
 Paul Rubin http://phr...@nospam.invalid writes:
 '%.3K' % 1234567 = 1.235K # K = 1000
 '%.:3Ki' % 1234567 = 1.206K   # K = 1024
 
 I meant 1.235M and 1.177M, of course.

I went tilt like a slot machine long before I noticed...
:-)

- Hendrik

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-15 Thread JanC
Raymond Hettinger wrote:

 No doubt that you're skeptical of anything you didn't
 already know ;-)  I'm a CPA, was a 15 year division controller
 for a Fortune 500 company, and an auditor for an international
 accounting firm.  Believe me when I say it is the norm in finance.
 Besides, it seems like you're arguing that thousands separators
 aren't needed anywhere at all and have doubts about their
 utility.  Pick-up your pocket calculator and take a look.
 Look at your paycheck or your bank statement.

My current bank and my previous bank use 2 ways to write numbers:

1. a decimal comma, and a space (or half-space or any other appropriate
   small whitespace) as a thousands separator
2. written full out in words (including the currency names)

Invoices (not from these banks) often use a point as the thousands
separator (although that's wrong according to some national
standards, it's probably okay according to accounting standards...). 

The second formatting (full words) is a legal requirement on certain
financial  legal documents here (and I can imagine in other countries
too?).  Anybody working on a PEP about implementing a 'w' (for wordy?)
formatting type?  ;-)


-- 
JanC
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-15 Thread Rhodri James
On Sat, 14 Mar 2009 08:20:21 -, Hendrik van Rooyen  
m...@microcorp.co.za wrote:



Tim Rowe digil.com wrote:

8 -


. If Finance users and non-professional
programmers find the locale approach to be frustrating, arcane and
non-obvious then by all means propose a way of making it simpler and
clearer, but not a bodge that will increase the amount of bad software
in the world.


I do not follow the reasoning behind this.

It seems to be based on an assumption that the locale approach
is some sort of holy grail that solves these problems, and that
anybody who does not like or use it is automatically guilty of
writing crap code.

No account seems to be taken of the fact that the locale approach
is a global one that forces uniformity on everything done on a PC
or by a user.


Like unicode, locales should make using your computer with your own
cultural settings a one-time configuration, and make using your
computer in another setting possible.  By and large they do this.

Like unicode, locales fail in as much as they make cross-cultural
usage difficult.  Unlike unicode, there is a lot of failure in the
standard locale library, which is almost entirely the fault of the
standard C locale library it uses.

Nobody's defending the implementation, as far as I've noticed
(which isn't saying much at the moment, but still...).  A bit of
poking around in the cheese shop suggests that Babel
(http://www.babel.edgewall.org/) would be better, and Babel with
a context manager would be better yet.


On the other hand, we have a small addition to format strings.
Unfortunately it's a small addition that doesn't feel terribly
natural in a mini-language that already runs the risk of looking
like line noise when you pull the stops out.  Not meaning the
term particularly unkindly, it is a bodge; it's quick and dirty,
syntactic saccharin rather than sugar for doing one particular
thing for one particular interest group, and which looks
deceptively like the right thing to do for everyone else.

That's a bad thing to do.  If we ever do get round to fixing
localisation (i.e. making overriding bits of locales easy), it
becomes a feature that's automatically present that we have
to discourage normal programmers from using despite it's
apparent usefulness.


Frankly, I'd much rather fix the locale system and extend
the format syntax to override the default locale.  Perhaps
something like

  financial = Locale(group_sep=,, grouping=[3])
  print(my number is {0:10n:financial}.format(1234567))

It's hard to think of a way of extending % format strings
to cope with this that won't look utterly horrid, though!

--
Rhodri James *-* Wildebeeste Herder to the Masses
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-15 Thread MRAB

Rhodri James wrote:
[snip]

Frankly, I'd much rather fix the locale system and extend
the format syntax to override the default locale.  Perhaps
something like

  financial = Locale(group_sep=,, grouping=[3])
  print(my number is {0:10n:financial}.format(1234567))

It's hard to think of a way of extending % format strings
to cope with this that won't look utterly horrid, though!


The problem with your example is that it magically looks for the locale
name financial in the current namespace. Perhaps the name should be
registered somewhere like this:

locale.predefined[financial] = Locale(group_sep=,, grouping=[3])
print(my number is {0:10n:financial}.format(1234567))
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-15 Thread Tim Rowe
2009/3/14 Hendrik van Rooyen m...@microcorp.co.za:

 No account seems to be taken of the fact that the locale approach
 is a global one that forces uniformity on everything done on a PC
 or by a user.

Not so. Under .NET, for instance, the global settings will give you a
default CultureInfo class, but you can create your own CultureInfo
classes for other cultures in your program and use them in place of
the default.

 So when you want to make a report in a format that would suit
 what your foreign visitors are used to, do you have to change
 your server's locale, and change it back again afterwards, or what ?

No, you create a local locale and use that.

There are essentially three possible levels I can see for this:

- programs that will only ever be used in one locale, known in
advance. They can have the locale hard-wired into the program. No
special support is needed for this. It's pretty easy to write a
function to format a number to a hard-wired locale. I've done it in
Pascal and FORTH and it was easy-peasy, so I can't imagine it's going
to be a big deal in Python. If it's such a big deal for accountants to
write this code, if they ask in this forum how to do it somebody will
almost certainly supply a function that takes a float and returns a
formatted string within a few minutes. It might even be you or me.

- Programs that may be used in any unchanging locale. The existing
locale support is built for this case.

- Programs that nead to operate across locales. This can either be
managed by switching global locales (which you rightly deprecate) or
by managing alternate locales within the program.

 The locale approach has all the disadvantages of global variables.

No, it has all the advantages of global constants used as overridable
defaults for local variables.

 To make software usable by, or expandable to, different languages
 and cultures is a tricky design problem - you have to, at the
 minimum, do things like storing all your text, both for prompts and
 errors, in some kind of database and refer to it by its key, everywhere.
 You cannot simply assume, that because a number represents
 a monetary value, that it is Yen, or Australian Dollar, or whatever -
 you may have to convert it first, from its currency, to the currency
 that you want to display it as, and only then can you worry about
 the format that you want to display it in.

Nothing in the proposal being considered addresses any of that.

-- 
Tim Rowe
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-15 Thread Rhodri James
On Sun, 15 Mar 2009 19:00:43 -, MRAB goo...@mrabarnett.plus.com  
wrote:



Rhodri James wrote:
[snip]

Frankly, I'd much rather fix the locale system and extend
the format syntax to override the default locale.  Perhaps
something like
   financial = Locale(group_sep=,, grouping=[3])
  print(my number is {0:10n:financial}.format(1234567))
 It's hard to think of a way of extending % format strings
to cope with this that won't look utterly horrid, though!


The problem with your example is that it magically looks for the locale
name financial in the current namespace.


True, to an extent.  The counter-argument of Is it so much
more magical than '{keyword}' looking up the object in the
parameter list suggests a less magical approach would be to
make the locale a parameter itself:

  print(my number is {0:10n:{1}}.format(1234567, financial)


Perhaps the name should be
registered somewhere like this:

 locale.predefined[financial] = Locale(group_sep=,, grouping=[3])
 print(my number is {0:10n:financial}.format(1234567))


I'm not sure that I don't think that *more* magical than my
first stab!  Regardless of the exact syntax, do you think
that being able to specify an overriding locale object (and
let's wave our hands over what one of those is too) is the
right approach?

--
Rhodri James *-* Wildebeeste Herder to the Masses
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-15 Thread MRAB

Rhodri James wrote:
On Sun, 15 Mar 2009 19:00:43 -, MRAB goo...@mrabarnett.plus.com 
wrote:



Rhodri James wrote:
[snip]

Frankly, I'd much rather fix the locale system and extend
the format syntax to override the default locale.  Perhaps
something like
   financial = Locale(group_sep=,, grouping=[3])
  print(my number is {0:10n:financial}.format(1234567))
 It's hard to think of a way of extending % format strings
to cope with this that won't look utterly horrid, though!


The problem with your example is that it magically looks for the locale
name financial in the current namespace.


True, to an extent.  The counter-argument of Is it so much
more magical than '{keyword}' looking up the object in the
parameter list suggests a less magical approach would be to
make the locale a parameter itself:

  print(my number is {0:10n:{1}}.format(1234567, financial)


The field name can be an integer or an identifier, so the locale could
be too, provided that you know where to look it up!

financial = Locale(group_sep=,, grouping=[3])
print(my number is {0:10n:{fin}}.format(1234567, fin=financial))

Then again, shouldn't that be:

fin = Locale(group_sep=,, grouping=[3])
print(my number is {0:{fin}}.format(1234567, fin=financial))


Perhaps the name should be
registered somewhere like this:

 locale.predefined[financial] = Locale(group_sep=,, grouping=[3])
 print(my number is {0:10n:financial}.format(1234567))


I'm not sure that I don't think that *more* magical than my
first stab!  Regardless of the exact syntax, do you think
that being able to specify an overriding locale object (and
let's wave our hands over what one of those is too) is the
right approach?



--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-14 Thread Raymond Hettinger
[Lie Ryan]
 My proposition is: make the format specifier a simpler API to locale
 aware

You do know that we already have one, right?
That's what the existing n specifier does.


Raymond
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-14 Thread Hendrik van Rooyen
John Nagle na...@animats.com wrote:

 Yes.  In COBOL, one writes
 
 PICTURE $999,999,999.99
 
 which is is way ahead of most of the later approaches.

That was fixed width. For zero suppression:

PIC ,$$$,$99.99  

This will format 1000 as $1,000.00

For fixed width zero suppression:

PIC $ZZZ,ZZZ,Z99.99

gives a fixed width field - $ 1,000.00
with a fixed width font, this will line the column up,
so that the decimals are under each other.

- Hendrik


--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread Lie Ryan

Hendrik van Rooyen wrote:

Ulrich Eckhardt eck...aser.com wrote:


IOW, why not explicitly say what you want using keyword arguments with
defaults instead of inventing an IMHO cryptic, read-only mini-language?
Seriously, the problem I see with this proposal is that its aim to be as
short as possible actually makes the resulting format specifications
unreadable. Could you even guess what 8T.,1f should mean if you had not
written this?


+1

Look back in history, and see how COBOL did it with the
PICTURE - dead easy and easily understandable.
Compared to that, even the C printf stuff  and python's %
are incomprehensible.

- Hendrik


Seeing how many people complained for the proposal being unreadable 
(although it tries to be simple by not including too much features), why 
not go all the way to unreadability and teach people to always use some 
sort of convenience function and never use the microlanguage except of 
very simple cases (or extremely complex cases, in which case you might 
actually be better served with writing your own formatting function).


A hyphotetical code using conv function and the microlanguage could look 
like this:


 num = 213210.3242
 fmt = create_format(sep='-', decsep='@')
 print fmt
50|\/|3_v3ry_R34D4|3L3_C0D3
 '{0!{1}}'.format(num, fmt)
'213-...@3242'
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread Raymond Hettinger
[Lie Ryan]
 A hyphotetical code using conv function and the microlanguage could look
 like this:

   num = 213210.3242
   fmt = create_format(sep='-', decsep='@')
   print fmt
 50|\/|3_v3ry_R34D4|3L3_C0D3
   '{0!{1}}'.format(num, fmt)
 '213-...@3242'

LOL, it's like APL all over again ;-)

FWIW, the latest version of the proposal is dirt simple:

 format(1234567, 'd')   # what we have now
'1234567'
 format(1234567, ',d')  # proposed new option
'1,234,567'
 format(1234.5, '.2f')  # what we have now
'1234.50'
 format(1234.5, ',.2f') # proposed new option
'1,234.50'


The proposal is roughly:
  If you want commas in the output,
  put a comma in the format string.
It's not rocket science.

What is rocket science is what you have to do now
to achieve the same effect.  If someone finds the
above to be baffling, how the heck are they going
to do the same thing using the locale module?


Raymond
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread andrew cooke
Raymond Hettinger wrote:
 [Lie Ryan]
 A hyphotetical code using conv function and the microlanguage could look
 like this:

   num = 213210.3242
   fmt = create_format(sep='-', decsep='@')
   print fmt
 50|\/|3_v3ry_R34D4|3L3_C0D3
   '{0!{1}}'.format(num, fmt)
 '213-...@3242'

 LOL, it's like APL all over again ;-)

 FWIW, the latest version of the proposal is dirt simple:

 format(1234567, 'd')   # what we have now
 '1234567'
 format(1234567, ',d')  # proposed new option
 '1,234,567'
 format(1234.5, '.2f')  # what we have now
 '1234.50'
 format(1234.5, ',.2f') # proposed new option
 '1,234.50'

would it break anything to also allow

 format(1234567, 'd')   # what we have now
 '1234567'
 format(1234567, '.d')  # proposed new option
 '1.234.567'
 format(1234.5, ',2f')  # proposed new option
 '1234,50'
 format(1234.5, '.,2f') # proposed new option
 '1.234,50'

because that would support a moderate chunk of the non-english speaking
users and seems like a natural extension.

(i'm still not sure this is that great an idea - if you think using a
locale is rocket science then perhaps your excess energy would be better
spent making locale easier, rather than tweaking this behaviour for a
subset of users?)

andrew


--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread Paul Rubin
Raymond Hettinger pyt...@rcn.com writes:
 The proposal is roughly:
   If you want commas in the output,
   put a comma in the format string.
 It's not rocket science.

What if you want to change the separator?  Europeans usually
use periods instead of commas: one thousand = 1.000.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread Raymond Hettinger
[andrew cooke]
 would it break anything to also allow

  format(1234567, 'd')       # what we have now
  '1234567'
  format(1234567, '.d')      # proposed new option
  '1.234.567'
  format(1234.5, ',2f')      # proposed new option
  '1234,50'
  format(1234.5, '.,2f')     # proposed new option

Yes, that's allowed too!  The separators can be any one of COMMA,
SPACE, DOT, UNDERSCORE, or NON-BREAKING-SPACE.


--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread Raymond Hettinger
[Paul Rubin]
 What if you want to change the separator?  Europeans usually
 use periods instead of commas: one thousand = 1.000.

That is supported also.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread Tim Rowe
2009/3/12 Raymond Hettinger pyt...@rcn.com:
 If anyone here is interested, here is a proposal I posted on the
 python-ideas list.

 The idea is to make numbering formatting a little easier with the new
 format() builtin
 in Py2.6 and Py3.0:  http://docs.python.org/library/string.html#formatspec

As far as I can see you're proposing an amendment to *encourage*
writing code that is not locale aware, with the amendment itself being
locale specific, which surely has to be a regressive move in the 21st
century. Frankly, I'd sooner see it made /harder/ to write code that
is not locale aware (warnings, like FxCop gives on .net code?) tnan
/easier/. Perhaps that's because I'm British, not American and I'm
sick of having date fields get the date wrong because the programmer
thinks the USA is the world. It makes me sympathetic to the problems
caused to others by programmers who think the English-speaking world
is the world.

By the way, to others who think that 123,456.7 and 123.456,7 are the
only conventions in common use in the West, no they're not. 123 456.7
is in common use in engineering, at least in Europe, precisely to
reduce (though not eliminate) problems caused by dot and comma
confusion..

-- 
Tim Rowe
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread Wolfgang Rohdewald
On Freitag, 13. März 2009, Raymond Hettinger wrote:
 [Paul Rubin]
  What if you want to change the separator?  Europeans usually
  use periods instead of commas: one thousand = 1.000.
 
 That is supported also.

do you support just a fixed set of separators or anything?

how about this: (Switzerland)

12'000.99

or spacing:

12 000.99

-- 
Wolfgang
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread pruebauno
On Mar 13, 7:06 am, Tim Rowe digi...@gmail.com wrote:
 2009/3/12 Raymond Hettinger pyt...@rcn.com:

  If anyone here is interested, here is a proposal I posted on the
  python-ideas list.

  The idea is to make numbering formatting a little easier with the new
  format() builtin
  in Py2.6 and Py3.0:  http://docs.python.org/library/string.html#formatspec

 As far as I can see you're proposing an amendment to *encourage*
 writing code that is not locale aware, with the amendment itself being
 locale specific, which surely has to be a regressive move in the 21st
 century. Frankly, I'd sooner see it made /harder/ to write code that
 is not locale aware (warnings, like FxCop gives on .net code?) tnan
 /easier/. Perhaps that's because I'm British, not American and I'm
 sick of having date fields get the date wrong because the programmer
 thinks the USA is the world. It makes me sympathetic to the problems
 caused to others by programmers who think the English-speaking world
 is the world.

 By the way, to others who think that 123,456.7 and 123.456,7 are the
 only conventions in common use in the West, no they're not. 123 456.7
 is in common use in engineering, at least in Europe, precisely to
 reduce (though not eliminate) problems caused by dot and comma
 confusion..

 --
 Tim Rowe

I lived in three different countries and in school used blank for
thousand separator to avoid confusion with the multiply operator. I
think this proposal is more for debugging big numbers and meant mostly
for programmers' eyes. We are already using the dot instead of comma
decimal separator in our programming languages that one more
Americanism won't kill us.

I am leaning towards proposal 1 now just to avoid the thousand
variations that will be requested because of this, making the
implementation unnecessarily complex. I can always use the 3
replacement hack (conveniently documented in the pep).

+1 for Nick's proposal
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread Tim Rowe
2009/3/13  prueba...@latinmail.com:
 I think this proposal is more for debugging big numbers and meant mostly
 for programmers' eyes. We are already using the dot instead of comma
 decimal separator in our programming languages that one more
 Americanism won't kill us.

If it were for the programmers' eyes then it would be in the code, not
in the formatted output. Debugging of big numbers can be done by
checking within code, so there's no need to let this escape to the
output.

And if it's for programmers' eyes then the statement The COMMA is
used when a PERIOD is the decimal separator is wrong, at least if it
means that the COMMA is the /only/ separator used when a PERIOD is the
decimal separator. Ada uses UNDERSCOREs, which can be placed almost
anywhere in a numeric literal and are ignored.

And if it's mostly for programmers' eyes, why does the motivation
state that Adding thousands separators is one of the simplest ways to
improve the professional appearance and readability of output exposed
to end users? The proposal is clearly for the presentation of numbers
to end users, and quite simply is an encouragement to sloppiness in
presenting those numbers. If Finance users and non-professional
programmers find the locale approach to be frustrating, arcane and
non-obvious then by all means propose a way of making it simpler and
clearer, but not a bodge that will increase the amount of bad software
in the world.

-1 for all of the proposals.
-- 
Tim Rowe
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread Lie Ryan

Raymond Hettinger wrote:

[andrew cooke]

would it break anything to also allow


format(1234567, 'd')   # what we have now

 '1234567'

format(1234567, '.d')  # proposed new option

 '1.234.567'

format(1234.5, ',2f')  # proposed new option

 '1234,50'

format(1234.5, '.,2f') # proposed new option


Yes, that's allowed too!  The separators can be any one of COMMA,
SPACE, DOT, UNDERSCORE, or NON-BREAKING-SPACE.


What if I want other separators?

How about this idea: make the format has long format, which is a bit 
more verbose, flexible, and unambiguous, and the current proposal a 
short format, which is more concise.


The long format would be like this (this is much, much more featureful 
than the current proposition, I think I might have crossed far beyond 
the Mayan line):


[n|sign signnegative[[, signzero], signpositive] | ]
[w|min minwidth[, align[, alignfill]]]
[x|max maxwidth[, overflowsign[, overflowalign]]]
[s|sep [[...]sepsepwidth]sepsepwidth | ]
[dp|decpoint decpoint | ]
[ds|decsep widthsep[, widthsep[...]] | ]
[b|base base-n[, charset]]
[p|prec prec | ]
t|type type

The feel of long format
fmt_string: 'type f'

  number: 876543213456.98765445
  result: 876543213456.98765445

fmt_string: 'decpoint ^ | type f'

  number: 876543213456.98765445
  result: 876543213456^98765445

fmt_string: 'sep 21:3.4 | decpoint , | prec 3 | type f'

  number: 876543213456.98765445
  result: 87654:321.3456,988

fmt_string: 'sep 21:3.4 | decpoint , | prec 3 | type f'

  number: 876543213456.98765445
  result: 87654:321.3456,988

fmt_string: 'sep 21:3.4 | decpoint , | prec 3 | type f'

  number: 876543213456.98765445
  result: 87654:321.3456,988

General Rules:
- every field, except type is optional
- fields are separated by | (this may change), escape literal | with ||
- every fields starts with an identifier then a mandatory whitespace
- subfields are separated by commas. Each identifier has long and short 
identifier.
- Processing precedent is: type, base, prec, sep/decsep, decpoint, sign, 
min, max


Specific rules:
- min and max determines width, min determine the rule when the 
resulting string is shorter than minwidth, max determine rule when the 
resulting string is longer than maxwidth (basically trimming). alignfill 
is character/sequence of character to be used to make the resulting 
string as long as minwidth, overflowsign is character added when 
maxwidth is exceeded and trimming occurs
- sep is basically a separator delimited for each width. The regular 
latin number system would be represented as sep 3.3 the leftmost number 
and separator would be repeated.

- decsep works similarly to sep
- base is the number base, charset is mapping of digits used to 
represent output number in the certain base.


PS: It is not designed for hand written, but is meant to be fairly readable
PPS: It is fairly modular too
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread Raymond Hettinger
  The separators can be any one of COMMA,
  SPACE, DOT, UNDERSCORE, or NON-BREAKING-SPACE.

 What if I want other separators?

format(n, ',d').replace(,, yoursep)


 How about this idea: make the format has long format, which is a bit
 more verbose, flexible, and unambiguous, and the current proposal a
 short format, which is more concise.

 The long format would be like this (this is much, much more featureful
 than the current proposition, I think I might have crossed far beyond
 the Mayan line):

I concur ;-)


Raymond
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread John Nagle

Lie Ryan wrote:

Hendrik van Rooyen wrote:

Ulrich Eckhardt eck...aser.com wrote:



Look back in history, and see how COBOL did it with the
PICTURE - dead easy and easily understandable.
Compared to that, even the C printf stuff  and python's %
are incomprehensible.

- Hendrik


   Yes.  In COBOL, one writes

PICTURE $999,999,999.99

which is is way ahead of most of the later approaches.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread Raymond Hettinger
Todays updates to:  http://www.python.org/dev/peps/pep-0378/

* Detail issues with the locale module.
* Summarize commentary to date.
   -- Opposition to formatting strings in general
  (preferring a convenience function or PICTURE clause)
   -- Opposition to any non-locale aware approach
* Add APOSTROPHE and non-breaking SPACE to the list of separators.
* Add more links to external references
  (Babel, Excel, ADA, CommonLisp, COBOL, C-Sharp).
* Clarify how proposal II is parsed.


Raymond
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread MRAB

Raymond Hettinger wrote:

Todays updates to:  http://www.python.org/dev/peps/pep-0378/

* Detail issues with the locale module.
* Summarize commentary to date.
   -- Opposition to formatting strings in general
  (preferring a convenience function or PICTURE clause)
   -- Opposition to any non-locale aware approach
* Add APOSTROPHE and non-breaking SPACE to the list of separators.
* Add more links to external references
  (Babel, Excel, ADA, CommonLisp, COBOL, C-Sharp).
* Clarify how proposal II is parsed.

I'd just like to make the point that the string methods, eg 
unicode.upper, aren't locale-sensitive, so 'format' shouldn't be either.


The string methods could perhaps retain their current behaviour as the
default and accept a parameter to make them locale-sensitive. The same
could be the case for 'format' so the format string has . to represent
the decimal point and , to represent the digit separator, and those
would be the default, but it could accept a flag (L?) to make it
locale-sensitive.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread Paul Rubin
Tim Rowe digi...@gmail.com writes:
 And if it's mostly for programmers' eyes, why does the motivation
 state that Adding thousands separators is one of the simplest ways to
 improve the professional appearance and readability of output exposed
 to end users? 

It occurs to me, at least for quantities of data, one of the most
useful aids to readability is scaling down the quantity and suffixing
it with K (kilo), M (mega), G (giga), etc.  This is sometimes done
with K=1000 and sometimes with K=1024 (fancy pronunciation kibi
rather than kilo, officially abbreviated Ki).  Possible formatting:

   '%.3K' % 1234567 = 1.235K # K = 1000
   '%.:3Ki' % 1234567 = 1.206K   # K = 1024

The colon (two dots) signifies base two.  The i is not part of the
format spec, it's just a literal character, to make the standard
abbreviation for kibi.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread Paul Rubin
Paul Rubin http://phr...@nospam.invalid writes:
'%.3K' % 1234567 = 1.235K # K = 1000
'%.:3Ki' % 1234567 = 1.206K   # K = 1024

I meant 1.235M and 1.177M, of course.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread Lie Ryan

Raymond Hettinger wrote:



Motivation:

Provide a simple, non-locale aware way to format a number
with a thousands separator.

Adding thousands separators is one of the simplest ways to
improve the professional appearance and readability of
output exposed to end users.

In the finance world, output with commas is the norm.  Finance
users
and non-professional programmers find the locale approach to be
frustrating, arcane and non-obvious.

It is not the goal to replace locale or to accommodate every
possible convention.  The goal is to make a common task easier
for many users.



Raymod, I think there are several problems with the Motivations:

 The goal is to make a common task easier
 for many users.

Common task, for most people, means formatting numbers to the locale. We 
should make converting numbers to locale easier to use, as easy as 
calling a magic function that can convert the current object to the 
locale representation or as simple as defining locale ID in the mini 
language. This proposal, I believe, is for the _less_ common task of 
formatting a number to a custom format not generally used anywhere else 
in the world (like formatting a number to form an ipv6 address or 
formatting a number to html/TeX code[1]).


[1] I know one mathematic textbook that uses superscript negative for 
negative number to disambiguate it with minus sign.


 In the finance world, output with commas is the norm.

I can't cite any source, but I am skeptical with that. And how about 
non-finance world? Scientific world? Pure math world?


 Provide a simple, non-locale aware way to format a number
 with a thousands separator.

Many have pointed out, locale is hard to use, this is easier approach 
but pity it is not locale aware. If we want to provide a non-locale 
aware formatting, we must make it flexible enough to make it the 
Ultimate Formatter. Otherwise it will just be redundant to locale.


 Adding thousands separators is one of the simplest ways to
 improve the professional appearance and readability of
 output exposed to end users.

There are infinitely many approach to numbers. One Singaporean text book 
uses half-width space as thousand separator. One Autralian text book 
uses superscript minus for negative numbers (which I believe would 
require more than Unicode to represent, TeX or PDF perhaps). The 
accounting world sometimes uses colors and parentheses to denote 
negative numbers (this requires emmiting codes for the layout program: 
HTML, TeX, PDF)


Anything less powerful than my proposed Crossing Mayan line is just a 
harder alternative for locale module.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread Raymond Hettinger
[Lie Ryan]
       In the finance world, output with commas is the norm.

 I can't cite any source, but I am skeptical with that.

No doubt that you're skeptical of anything you didn't
already know ;-)  I'm a CPA, was a 15 year division controller
for a Fortune 500 company, and an auditor for an international
accounting firm.  Believe me when I say it is the norm in finance.
Besides, it seems like you're arguing that thousands separators
aren't needed anywhere at all and have doubts about their
utility.  Pick-up your pocket calculator and take a look.
Look at your paycheck or your bank statement.  Check-out a
publishing style guide.  They are somewhat basic.  There's
a reason the MS Excel and Lotus offered them from day one.

Python's format() style was taken directly from C-Sharp.
which offers both an n format that is locale sensitive
and a non-locale-sensitive variant that specifies a comma.
I'm suggesting that we also do both.

Random, make-up statistic:  99% of Python scripts are
not internationalized, have no need to be internationalized,
and have output intended to be used in the script writer's
immediate environment.

Another issue I have with locale is that you have to find
one that matches every specific need.  Quick, which one gives
you non-breaking spaces for a thousands separator?  If you
do find such a locale and it happens to be spelled the same
way on every platform, is it self-evident in your program
that it will in fact print with spaces or has that become
an implicit, behind the scenes operation.  If later you need
to print another number with a different separator, do you
have a way make that happen without breaking the first piece
of code you wrote?

The locale module has plenty of issues for a programmer to
think about:
http://docs.python.org/library/locale.html#background-details-hints-tips-and-caveats

Besides, lots of people use Python who are not professional
programmers.  We should not require them enter the complicated
world of locale just to do a basic formatting task.  When
I teach Python to pre-college students, there is no way I'm
adding locale to the list of things they need to learn to
become functional with the language.

Sorry for the long post, but I feel like you keep inventing
heavy solutions that don't fit well with what we already have.
This should be a simple problem -- when writing a number
format, how I specify that I want character X as a thousands
separator.  The answer to that question should be nothing
harder than, add character X to the format string.

You're a very creative person, but I don't see Guido accepting
any idea that rejects what he has already chosen as the way
to format strings.  He is no fan of the locale module's API,
but it is tightly bound to existing programs and POSIX
standards.  That greatly limits the options for changing it.

I'm sure you can come-up with 500 ways of meeting this need
(almost none of which meld with Guido's choice to accept
PEP3101 for both 2.6 and 3.0).   I'm offering a simple
extension to the existing framework that makes the above
tasks easy.  C-sharp make essentially the same choice in its
design.  There's no reason for you to have to use it
if you hate it.

Cheers,


Raymond
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread Lie Ryan

Raymond Hettinger wrote:

If anyone here is interested, here is a proposal I posted on the
python-ideas list.

The idea is to make numbering formatting a little easier with the new
format() builtin
in Py2.6 and Py3.0:  http://docs.python.org/library/string.html#formatspec


-


Motivation:

   Provide a simple, non-locale aware way to format 

a number with a thousands separator.


   Adding thousands separators is one of the simplest ways to improve the 
professional appearance and readability of output.

   In the finance world, output with commas is the norm. Finance users and 
non-professional programmers find the locale approach to be frustrating, arcane 
and non-obvious.

   The locale module presents two other challenges. First, it is a global setting and not 
suitable for multi-threaded apps that need to serve-up requests in multiple locales. 
Second, the name of a relevant locale (such as de_DE) can vary from platform 
to platform or may not be defined at all. The docs for the locale module describe these 
and many other challenges [1] in detail.

   It is not the goal to replace the locale module or to accommodate every 
possible convention. Such tasks are better suited to robust tools like Babel 
[2] . Instead, our goal is to make a common, everyday task easier for many 
users.








Comments and suggestions are welcome but I draw the line at supporting
Mayan numbering conventions ;-)


Raymond


--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-13 Thread Lie Ryan

Raymond Hettinger wrote:

[Lie Ryan]

  In the finance world, output with commas is the norm.

I can't cite any source, but I am skeptical with that.


No doubt that you're skeptical of anything you didn't
already know ;-)  I'm a CPA, was a 15 year division controller
for a Fortune 500 company, and an auditor for an international
accounting firm.  Believe me when I say it is the norm in finance.
Besides, it seems like you're arguing that thousands separators
aren't needed anywhere at all and have doubts about their
utility.  Pick-up your pocket calculator and take a look.
Look at your paycheck or your bank statement.  Check-out a
publishing style guide.  They are somewhat basic.  There's
a reason the MS Excel and Lotus offered them from day one.


I have no reason to doubt that output with separators is nice, but I am 
skeptical that all financial institution in the world (not just US) uses 
commas for their separators.



Python's format() style was taken directly from C-Sharp.
which offers both an n format that is locale sensitive
and a non-locale-sensitive variant that specifies a comma.
I'm suggesting that we also do both.


I'm fine with that. But no commas, instead user-defineable separators.


Random, make-up statistic:  99% of Python scripts are
not internationalized, have no need to be internationalized,
and have output intended to be used in the script writer's
immediate environment.


Random, make up statistic: 95% of which is scripts written for 
personal/internal use.



 If you
 do find such a locale and it happens to be spelled the same
 way on every platform, is it self-evident in your program
 that it will in fact print with spaces or has that become
 an implicit, behind the scenes operation.  If later you need
 to print another number with a different separator, do you
 have a way make that happen without breaking the first piece
 of code you wrote?

Yeah, every data in transmission should be in locale independent format, 
it should only be turned to locale aware format just before viewing to 
the user. That way nothing will break.


Since you're an accountant, I am sure you know about Quicken Files, 
which stores data in locale format, which IMHO is a very BAD design.






Another issue I have with locale is that you have to find
one that matches every specific need.  Quick, which one gives
you non-breaking spaces for a thousands separator?  


That wasn't the issue. Most programs would either use the environment's 
locale and give user configuration to override the locale or I don't 
care, the output is for personal/internal consumption or The data only 
makes sense with certain formatting. I don't see a use case where the 
programmer would really want to hardcode a locale AND want the output to 
be exactly like what he sees in the user machine.


The first case (use the environment's locale and give user 
configuration to override the locale) is for internationalized 
applications, and is served by locale. The locale module is currently 
difficult to work with, so I believe we should provide a more accessible 
way.


The second case (I don't care, the output is for personal/internal 
consumption), is well served by python's default view.


The third case (The data only makes sense with certain formatting) is 
the one that will benefit the most from non-locale aware formatting. But 
they would require a very powerful formatter. Such use case is 
formatting IP address, telephone number, ID card number, etc.


My proposition is: make the format specifier a simpler API to locale 
aware or make it capable to serve the third case. I would rather 
prioritize on the former case.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-12 Thread Raymond Hettinger
 If anyone here is interested, here is a proposal I posted on the
 python-ideas list.

 The idea is to make numbering formatting a little easier with
 the new format() builtin:
 http://docs.python.org/library/string.html#formatspec

Here's a re-post (hopefully without the line wrapping problems
in the previous post).

Raymond

-



Motivation:
---

Provide a simple, non-locale aware way to format a number
with a thousands separator.

Adding thousands separators is one of the simplest ways to
improve the professional appearance and readability of output
exposed to end users.

In the finance world, output with commas is the norm.  Finance
users and non-professional programmers find the locale
approach to be frustrating, arcane and non-obvious.

It is not the goal to replace locale or to accommodate every
possible convention.  The goal is to make a common task easier
for many users.


Research so far:


Scanning the web, I've found that thousands separators are
usually one of COMMA, PERIOD, SPACE, or UNDERSCORE.  The
COMMA is used when a PERIOD is the decimal separator.

James Knight observed that Indian/Pakistani numbering systems
group by hundreds.   Ben Finney noted that Chinese group by
ten-thousands.

Visual Basic and its brethren (like MS Excel) use a completely
different style and have ultra-flexible custom format
specifiers like: _($* #,##0_).



Proposal I (from Nick Coghlan):
---

A comma will be added to the format() specifier mini-language:

[[fill]align][sign][#][0][minimumwidth][,][.precision][type]

The ',' option indicates that commas should be included in the
output as a thousands separator. As with locales which do not
use a period as the decimal point, locales which use a
different convention for digit separation will need to use the
locale module to obtain appropriate formatting.

The proposal works well with floats, ints, and decimals.
It also allows easy substitution for other separators.
For example:

  format(n, 6,f).replace(,, _)

This technique is completely general but it is awkward in the
one case where the commas and periods need to be swapped:

  format(n, 6,f).replace(,, X).replace(., ,).replace(X,
.)


Proposal II (to meet Antoine Pitrou's request):
---

Make both the thousands separator and decimal separator user
specifiable but not locale aware.  For simplicity, limit the
choices to a comma, period, space, or underscore.

[[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision][type]

Examples:

  format(1234, 8.1f)-- '  1234.0'
  format(1234, 8,1f)-- '  1234,0'
  format(1234, 8T.,1f)  -- ' 1.234,0'
  format(1234, 8T .f)   -- ' 1 234,0'
  format(1234, 8d)  -- '1234'
  format(1234, 8T,d)-- '   1,234'

This proposal meets mosts needs (except for people wanting
grouping for hundreds or ten-thousands), but iIt comes at the
expense of being a little more complicated to learn and
remember.  Also, it makes it more challenging to write custom
__format__ methods that follow the format specification
mini-language.

For the locale module, just the T is necessary in a
formatting string since the tool already has procedures for
figuring out the actual separators from the local context.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-12 Thread Ulrich Eckhardt
Raymond Hettinger wrote:
 The idea is to make numbering formatting a little easier with
 the new format() builtin:
 http://docs.python.org/library/string.html#formatspec
[...]
 Scanning the web, I've found that thousands separators are
 usually one of COMMA, PERIOD, SPACE, or UNDERSCORE.  The
 COMMA is used when a PERIOD is the decimal separator.
 
 James Knight observed that Indian/Pakistani numbering systems
 group by hundreds.   Ben Finney noted that Chinese group by
 ten-thousands.

IIRC, some cultures use a non-uniform grouping, like e.g. 123 456 78.9.
For that, there is also a grouping reserved in the locale (at least in
those of C++ IOStreams, that is). Further, an that seems to also be one of
your concerns, there are different ways to represent negative numbers like
e.g. (123) or -456.


 Make both the thousands separator and decimal separator user
 specifiable but not locale aware.  For simplicity, limit the
 choices to a comma, period, space, or underscore.
 
 [[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision][type]
 
 Examples:
 
   format(1234, 8.1f)-- '  1234.0'
   format(1234, 8,1f)-- '  1234,0'
   format(1234, 8T.,1f)  -- ' 1.234,0'
   format(1234, 8T .f)   -- ' 1 234,0'
   format(1234, 8d)  -- '1234'
   format(1234, 8T,d)-- '   1,234'


How about this?
   format(1234, 8.1, tsep=,)
  -- ' 1,234.0'
   format(1234, 8.1, tsep=., dsep=,)
  -- ' 1.234,0'
   format(123456, tsep= , grouping=(3, 2,))
  -- '1 234 56'

IOW, why not explicitly say what you want using keyword arguments with
defaults instead of inventing an IMHO cryptic, read-only mini-language?
Seriously, the problem I see with this proposal is that its aim to be as
short as possible actually makes the resulting format specifications
unreadable. Could you even guess what 8T.,1f should mean if you had not
written this?

 This proposal meets mosts needs (except for people wanting
 grouping for hundreds or ten-thousands), but iIt comes at the
 expense of being a little more complicated to learn and
 remember.

Too expensive for my taste.

Uli

-- 
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-12 Thread Raymond Hettinger
[Ulrich Eckhardt]
 IOW, why not explicitly say what you want using keyword arguments with
 defaults instead of inventing an IMHO cryptic, read-only mini-language?

That makes sense to me but I don't think that's the way the format()
builtin was implemented (see PEP 3101 which was implemented Py2.6 and
3.0).
It is a simple pass-through to a __format__ method for each
formattable
object.  I don't see how keywords would fit in that framework.  What
is
proposed is similar to locale module's existing n specifier except
that
this lets you say exactly what you want instead of deferring to the
locale
settings.

The mini-language seems to already be the way of things (just as it is
many other languages including PHP, C, Fortran, and whatnot).  I'm
just
proposing an addition T, so you add commas as a thousands separator.


Raymond

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-12 Thread MRAB

Raymond Hettinger wrote:
[snip]

Proposal I (from Nick Coghlan):
---

A comma will be added to the format() specifier mini-language:

[[fill]align][sign][#][0][minimumwidth][,][.precision][type]

The ',' option indicates that commas should be included in the
output as a thousands separator. As with locales which do not
use a period as the decimal point, locales which use a
different convention for digit separation will need to use the
locale module to obtain appropriate formatting.

The proposal works well with floats, ints, and decimals.
It also allows easy substitution for other separators.
For example:

  format(n, 6,f).replace(,, _)

This technique is completely general but it is awkward in the
one case where the commas and periods need to be swapped:

  format(n, 6,f).replace(,, X).replace(., ,).replace(X,
.)


Proposal II (to meet Antoine Pitrou's request):
---

Make both the thousands separator and decimal separator user
specifiable but not locale aware.  For simplicity, limit the
choices to a comma, period, space, or underscore.

[[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision][type]

Examples:

  format(1234, 8.1f)-- '  1234.0'
  format(1234, 8,1f)-- '  1234,0'
  format(1234, 8T.,1f)  -- ' 1.234,0'
  format(1234, 8T .f)   -- ' 1 234,0'
  format(1234, 8d)  -- '1234'
  format(1234, 8T,d)-- '   1,234'

This proposal meets mosts needs (except for people wanting
grouping for hundreds or ten-thousands), but iIt comes at the
expense of being a little more complicated to learn and
remember.  Also, it makes it more challenging to write custom
__format__ methods that follow the format specification
mini-language.

For the locale module, just the T is necessary in a
formatting string since the tool already has procedures for
figuring out the actual separators from the local context.


[snip]
I'd probably prefer Proposal I with . representing the decimal point
and , representing the grouping (thousands) separator, although I'd
add an L flag to indicate that it should use the locale to provide the
actual characters to be used and even the number of digits for the
grouping:

[[fill]align][sign][#][0][minimumwidth][,][.precision][L][type]

Examples:

  Assuming the locale has:

decimal point:  ,
grouping separator: .
grouping spacing:   3

  format(123456, 10.1f)-- '  123456.0'
  format(123456, 10.1Lf)   -- ' 123.456,0'
  format(123456, 10,.1f)   -- ' 123,456.0'
  format(123456, 10,.1Lf)  -- ' 123.456,0'

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-12 Thread Hendrik van Rooyen
Ulrich Eckhardt eck...aser.com wrote:

IOW, why not explicitly say what you want using keyword arguments with
defaults instead of inventing an IMHO cryptic, read-only mini-language?
Seriously, the problem I see with this proposal is that its aim to be as
short as possible actually makes the resulting format specifications
unreadable. Could you even guess what 8T.,1f should mean if you had not
written this?

+1

Look back in history, and see how COBOL did it with the
PICTURE - dead easy and easily understandable.
Compared to that, even the C printf stuff  and python's %
are incomprehensible.

- Hendrik


--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-12 Thread John Machin
On Mar 12, 9:56 pm, Raymond Hettinger pyt...@rcn.com wrote:
 [Ulrich Eckhardt]

  IOW, why not explicitly say what you want using keyword arguments with
  defaults instead of inventing an IMHO cryptic, read-only mini-language?

 That makes sense to me but I don't think that's the way the format()
 builtin was implemented (see PEP 3101 which was implemented Py2.6 and
 3.0).
 It is a simple pass-through to a __format__ method for each
 formattable
 object.  I don't see how keywords would fit in that framework.  What
 is
 proposed is similar to locale module's existing n specifier except
 that
 this lets you say exactly what you want instead of deferring to the
 locale
 settings.

 The mini-language seems to already be the way of things (just as it is
 many other languages including PHP, C, Fortran, and whatnot).  I'm
 just
 proposing an addition T, so you add commas as a thousands separator.


... and why not C (centum) for hundreds (can't have H(ollerith)) and W
for wan (the Chinese word for 10 thousand)?


--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-12 Thread pruebauno
On Mar 12, 3:30 am, Raymond Hettinger pyt...@rcn.com wrote:
 If anyone here is interested, here is a proposal I posted on the
 python-ideas list.

 The idea is to make numbering formatting a little easier with the new
 format() builtin
 in Py2.6 and Py3.0:  http://docs.python.org/library/string.html#formatspec

 -

 Motivation:

     Provide a simple, non-locale aware way to format a number
     with a thousands separator.

     Adding thousands separators is one of the simplest ways to
     improve the professional appearance and readability of
     output exposed to end users.

     In the finance world, output with commas is the norm.  Finance
 users
     and non-professional programmers find the locale approach to be
     frustrating, arcane and non-obvious.

     It is not the goal to replace locale or to accommodate every
     possible convention.  The goal is to make a common task easier
     for many users.

 Research so far:

     Scanning the web, I've found that thousands separators are
     usually one of COMMA, PERIOD, SPACE, or UNDERSCORE.  The
     COMMA is used when a PERIOD is the decimal separator.

     James Knight observed that Indian/Pakistani numbering systems
     group by hundreds.   Ben Finney noted that Chinese group by
     ten-thousands.

     Visual Basic and its brethren (like MS Excel) use a completely
     different style and have ultra-flexible custom format specifiers
     like: _($* #,##0_).

 Proposal I (from Nick Coghlan]:

     A comma will be added to the format() specifier mini-language:

     [[fill]align][sign][#][0][minimumwidth][,][.precision][type]

     The ',' option indicates that commas should be included in the
 output as a
     thousands separator. As with locales which do not use a period as
 the
     decimal point, locales which use a different convention for digit
     separation will need to use the locale module to obtain
 appropriate
     formatting.

     The proposal works well with floats, ints, and decimals.  It also
     allows easy substitution for other separators.  For example:

         format(n, 6,f).replace(,, _)

     This technique is completely general but it is awkward in the one
     case where the commas and periods need to be swapped.

         format(n, 6,f).replace(,, X).replace(., ,).replace
 (X, .)

 Proposal II (to meet Antoine Pitrou's request):

     Make both the thousands separator and decimal separator user
 specifiable
     but not locale aware.  For simplicity, limit the choices to a
 comma, period,
     space, or underscore..

     [[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision]
 [type]

     Examples:

         format(1234, 8.1f)    --     '  1234.0'
         format(1234, 8,1f)    --     '  1234,0'
         format(1234, 8T.,1f)  --     ' 1.234,0'
         format(1234, 8T .f)   --     ' 1 234,0'
         format(1234, 8d)      --     '    1234'
         format(1234, 8T,d)      --   '   1,234'

     This proposal meets mosts needs (except for people wanting
 grouping
     for hundreds or ten-thousands), but it comes at the expense of
     being a little more complicated to learn and remember.  Also, it
 makes it
     more challenging to write custom __format__ methods that follow
 the
     format specification mini-language.

     For the locale module, just the T is necessary in a formatting
 string
     since the tool already has procedures for figuring out the actual
     separators from the local context.

 Comments and suggestions are welcome but I draw the line at supporting
 Mayan numbering conventions ;-)

 Raymond

As far as I am concerned the most simple version plus a way to swap
around commas and period is all that is needed. The rest can be done
using one replace (because the decimal separator is always one of two
options). This should cover everywhere but the far east. 80% of cases
for 20% of implementation complexity.

For example:

[[fill]align][sign][#][0][,|.][minimumwidth][.precision][type]

 format(1234, .8.1f)  -- ' 1.234,0'
 format(1234, ,8.1f)  -- ' 1,234.0'

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-12 Thread Raymond Hettinger
On Mar 12, 7:51 am, prueba...@latinmail.com wrote:
 On Mar 12, 3:30 am, Raymond Hettinger pyt...@rcn.com wrote:



  If anyone here is interested, here is a proposal I posted on the
  python-ideas list.

  The idea is to make numbering formatting a little easier with the new
  format() builtin
  in Py2.6 and Py3.0:  http://docs.python.org/library/string.html#formatspec

  -

  Motivation:

      Provide a simple, non-locale aware way to format a number
      with a thousands separator.

      Adding thousands separators is one of the simplest ways to
      improve the professional appearance and readability of
      output exposed to end users.

      In the finance world, output with commas is the norm.  Finance
  users
      and non-professional programmers find the locale approach to be
      frustrating, arcane and non-obvious.

      It is not the goal to replace locale or to accommodate every
      possible convention.  The goal is to make a common task easier
      for many users.

  Research so far:

      Scanning the web, I've found that thousands separators are
      usually one of COMMA, PERIOD, SPACE, or UNDERSCORE.  The
      COMMA is used when a PERIOD is the decimal separator.

      James Knight observed that Indian/Pakistani numbering systems
      group by hundreds.   Ben Finney noted that Chinese group by
      ten-thousands.

      Visual Basic and its brethren (like MS Excel) use a completely
      different style and have ultra-flexible custom format specifiers
      like: _($* #,##0_).

  Proposal I (from Nick Coghlan]:

      A comma will be added to the format() specifier mini-language:

      [[fill]align][sign][#][0][minimumwidth][,][.precision][type]

      The ',' option indicates that commas should be included in the
  output as a
      thousands separator. As with locales which do not use a period as
  the
      decimal point, locales which use a different convention for digit
      separation will need to use the locale module to obtain
  appropriate
      formatting.

      The proposal works well with floats, ints, and decimals.  It also
      allows easy substitution for other separators.  For example:

          format(n, 6,f).replace(,, _)

      This technique is completely general but it is awkward in the one
      case where the commas and periods need to be swapped.

          format(n, 6,f).replace(,, X).replace(., ,).replace
  (X, .)

  Proposal II (to meet Antoine Pitrou's request):

      Make both the thousands separator and decimal separator user
  specifiable
      but not locale aware.  For simplicity, limit the choices to a
  comma, period,
      space, or underscore..

      [[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision]
  [type]

      Examples:

          format(1234, 8.1f)    --     '  1234.0'
          format(1234, 8,1f)    --     '  1234,0'
          format(1234, 8T.,1f)  --     ' 1.234,0'
          format(1234, 8T .f)   --     ' 1 234,0'
          format(1234, 8d)      --     '    1234'
          format(1234, 8T,d)      --   '   1,234'

      This proposal meets mosts needs (except for people wanting
  grouping
      for hundreds or ten-thousands), but it comes at the expense of
      being a little more complicated to learn and remember.  Also, it
  makes it
      more challenging to write custom __format__ methods that follow
  the
      format specification mini-language.

      For the locale module, just the T is necessary in a formatting
  string
      since the tool already has procedures for figuring out the actual
      separators from the local context.

  Comments and suggestions are welcome but I draw the line at supporting
  Mayan numbering conventions ;-)

  Raymond

 As far as I am concerned the most simple version plus a way to swap
 around commas and period is all that is needed.

Thanks for the feedback.

FWIW, posted a cleaned-up version of the proposal at
  http://www.python.org/dev/peps/pep-0378/


Raymond
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-12 Thread Paul Rubin
Raymond Hettinger pyt...@rcn.com writes:
 FWIW, posted a cleaned-up version of the proposal at
   http://www.python.org/dev/peps/pep-0378/

It would be nice if the PEP included a comparison between the proposed
scheme and how it is done in other programs and languages.  For
example, I think Common Lisp has a feature for formatting thousands.
Spreadsheets like Excel probably have something similar.  Those
programs are pretty well evolved and probably address the important
real use cases by now.  It might be best to follow an existing example
(with adjustments for Pythonification as necessary) to the extent
possible.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-12 Thread Raymond Hettinger
[Paul Rubin]
 It would be nice if the PEP included a comparison between the proposed
 scheme and how it is done in other programs and languages.

Good idea.  I'm hoping that people will post those here.
In my quick research, it looks like many languages offer
nothing more than the usual C style % formatting and defer
the rest for a local aware module.


  For
 example, I think Common Lisp has a feature for formatting thousands.

Do you have more detail?


 Spreadsheets like Excel probably have something similar.

I addressed that in the PEP in the section on VB and relatives.  Their
approach doesn't graft-on to our existing approach.  They use format
specifiers like: _($* #,##0_).


Raymond
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-12 Thread Raymond Hettinger
[Paul Rubin]
 I think Common Lisp has a feature for formatting thousands.

I found the Common Lisp spec for this and added it to the PEP.


Raymond

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-12 Thread Paul Rubin
Raymond Hettinger pyt...@rcn.com writes:
 In my quick research, it looks like many languages offer
 nothing more than the usual C style % formatting and defer
 the rest for a local aware module.

Hendrik van Rooyen's mention of Cobol's picture (aka PIC)
specifications might be added to the list.  Cautionary tale: I once
had a similar idea and suggested including a bastardized version of
PIC in an extension language for something I worked on once.  Another
programmer then coded a reasonable PIC subset and we shipped it.
Turned out that a number of our users were Cobol experts and once we
had anything like PIC, they expected the weirdest and most obscure
features (of which there were quite a few) of real Cobol PIC to work.
We ended up having to assign someone a fairly lengthy task of figuring
out the Cobol spec and implementing every last damn PIC feature.  But
I digress.


  example, I think Common Lisp has a feature for formatting thousands.
 Do you have more detail?

 http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node200.html

gives as an example:

 (format nil The answer is ~:D. (expt 47 x)) 
= The answer is 229,345,007.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-12 Thread Paul Rubin
Raymond Hettinger pyt...@rcn.com writes:
 I found the Common Lisp spec for this and added it to the PEP.

Ah, cool, I simultaneously looked for it and posted about it.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rough draft: Proposed format specifier for a thousands separator

2009-03-12 Thread Scott David Daniels

Raymond Hettinger wrote:
... a generally interesting PEP...

Missing from this PEP:
 output below the decimal point.

show results for something like:
  format(12345.54321, 15,.5f) -- '  12,345.543,21'

Explain the interaction on sizes and lengths (which numbers are digits,
which are length [I vote for length on overall, digits on precision]),
and what happens with length-4 -- I'd say explicitly 1000 is show as
1,000 despite style sheets that prefer 1000 and 10,000.

FWIW, I agree with pruebano, do the simplest easily usable thing, and
provide a way to swap  the commas and periods.  The rest can be ponied
in by string processing.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list