Re: [Python-Dev] Caching float(0.0)

2006-10-04 Thread Alastair Houghton
On 4 Oct 2006, at 06:34, Martin v. Löwis wrote:

 Alastair Houghton schrieb:
 On 3 Oct 2006, at 17:47, James Y Knight wrote:

 On Oct 3, 2006, at 8:30 AM, Martin v. Löwis wrote:
 As Michael Hudson observed, this is difficult to implement, though:
 You can't distinguish between -0.0 and +0.0 easily, yet you should.

 Of course you can. It's absolutely trivial. The only part that's  
 even
 *the least bit* sketchy in this is assuming that a double is 64  
 bits.
 Practically speaking, that is true on all architectures I know of,

 How about doing 1.0 / x, where x is the number you want to test?

 This is a bad idea. It may cause a trap, leading to program  
 termination.

AFAIK few systems have floating point traps enabled by default (in  
fact, isn't that what IEEE 754 specifies?), because they often aren't  
very useful.  And in the specific case of the Python interpreter, why  
would you ever want them turned on?  Surely in order to get  
consistent floating point semantics, they need to be *off* and Python  
needs to handle any exceptional cases itself; even if they're on, by  
your argument Python must do that to avoid being terminated.  (Not to  
mention the problem that floating point traps are typically delivered  
by a signal, the problems with which were discussed extensively in a  
recent thread on this list.)

And it does have two advantages over the other methods proposed:

1. You don't have to write the value to memory; this test will work  
entirely in the machine's floating point registers.

2. It doesn't rely on the machine using IEEE floating point.  (Of  
course, neither does the binary comparison method, but it still  
involves a trip to memory, and assumes that the machine doesn't have  
multiple representations for +0.0 or -0.0.)

Even if you're saying that there's a significant chance of a trap  
(which I don't believe, not on common platforms anyway), the  
configure script could test to see if this will happen and fall back  
to one of the other approaches, or see if it can't turn them off  
using the C99 fenv.h APIs.  (I think I'd agree with you that  
handling SIGFPE is undesirable, which is perhaps what you were  
driving at.)

Anyway, it's only an idea, and I thought I'd point it out as nobody  
else had yet.  If 0.0 is going to be cached, then I certainly think  
-0.0 and +0.0 should be two separate values if they exist on a given  
machine.  I'm less concerned about exactly how that comes about.

Kind regards,

Alastair.

--
http://alastairs-place.net



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-04 Thread Alastair Houghton
On 4 Oct 2006, at 02:38, Josiah Carlson wrote:

 Alastair Houghton [EMAIL PROTECTED] wrote:

 There is, of course, the option of examining their representations in
 memory (I described the general technique in another posting on this
 thread).  From what I understand of IEEE 764 FP doubles, -0.0 and +0.0
 have different representations, and if we look at the underlying
 representation (perhaps by a *((uint64*)(float_input))), we can
 easily distinguish all values we want to cache...

Yes, though a trip via memory isn't necessarily cheap, and you're  
also assuming that the machine doesn't use an FP representation with  
multiple +0s or -0s.  Perhaps they should be different anyway though,  
I suppose.

 And as I stated before, we can switch on those values.  Alternatively,
 if we can't switch on the 64 bit values directly...

 uint32* p = (uint32*)(double_input)
 if (!p[0]) { /* p[1] on big-endian platforms */
 switch p[1] { /* p[0] on big-endian platforms */
 ...
 }
 }

That's worse, IMHO, because it assumes more about the  
representation.  If you're going to look directly at the binary, I  
think all you can reasonably do is a straight binary comparison.  I  
don't think you should poke at the bits without first knowing that  
the platform uses IEEE floating point.

The reason I suggested 1.0/x is that it's one of the few ways (maybe  
the only way?) to distinguish -0.0 and +0.0 using arithmetic, which  
is what people that care about the difference between the two are  
going to care about.

Kind regards,

Alastair.

--
http://alastairs-place.net


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-04 Thread Nick Maclaren
Alastair Houghton [EMAIL PROTECTED] wrote:
 
 AFAIK few systems have floating point traps enabled by default (in  
 fact, isn't that what IEEE 754 specifies?), because they often aren't  
 very useful.

The first two statements are true; the last isn't.  They are extremely
useful, not least because they are the only practical way to locate
numeric errors in most 3 GL programs (including C, Fortran etc.)

 And in the specific case of the Python interpreter, why  
 would you ever want them turned on?  Surely in order to get  
 consistent floating point semantics, they need to be *off* and Python  
 needs to handle any exceptional cases itself; even if they're on, by  
 your argument Python must do that to avoid being terminated.

Grrk.  Why are you assuming that turning them off means that the
result is what you expect?  That isn't always so - sometimes it
merely means that you get wrong answers but no indication of that.

 or see if it can't turn them off using the C99 fenv.h APIs.

That is a REALLY bad idea.  You have no idea how broken that is,
and what the impact it would be on Python.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-04 Thread Nick Maclaren
James Y Knight [EMAIL PROTECTED] wrote:
 
 This is a really poor argument. Python should be moving *towards*  
 proper '754 fp support, not away from it. On the platforms that are  
 most important, the C implementations distinguish positive and  
 negative 0. That the current python implementation may be defective  
 when the underlying C implementation is defective doesn't excuse a  
 change to intentionally break python on the common platforms.

Perhaps you might like to think why only IBM POWERx (and NOT the
Cell or most embedded POWERs) is the ONLY mainstream system to have
implemented all of IEEE 754 in hardware after 22 years?  Or why
NO programming language has provided support in those 22 years,
and only Java and C have even claimed to?

See Kahan's How Javas Floating-Point Hurts Everyone Everywhere,
note that C99 is much WORSE, and then note that Java and C99 are
the only languages that have even attempted to include IEEE 754.

You have also misunderstood the issue.  The fact that a C implementation
doesn't support it does NOT mean that the implementation is defective;
quite the contrary.  The issue always has been that IEEE 754's basic
model is incompatible with the basic models of all programming
languages that I am familiar with (which is a lot).  And the specific
problems with C99 are in the STANDARD, not the IMPLEMENTATIONS.

 IEEE 754 is so widely implemented that IMO it would make sense to  
 make Python's floating point specify it, and simply declare floating  
 point operations on non-IEEE 754 machines as use at own risk, may  
 not conform to python language standard. (or if someone wants to use  
 a software fp library for such machines, that's fine too).

Firstly, see the above.  Secondly, Python would need MAJOR semantic
changes to conform to IEEE 754R.  Thirdly, what would you say to
the people who want reliable error detection on floating-point of
the form that Python currently provides?


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-04 Thread Nick Craig-Wood
On Wed, Oct 04, 2006 at 12:42:04AM -0400, Tim Peters wrote:
 [EMAIL PROTECTED]
  If C90 doesn't distinguish -0.0 and +0.0, how can Python?
 
 With liberal applications of piss  vinegar ;-)
 
  Can you give a simple example where the difference between the two is 
  apparent
  to the Python programmer?
 
 Perhaps surprsingly, many (well, comparatively many, compared to none
 ) people have noticed that the platform atan2 cares a lot:
 
  from math import atan2 as a
  z = 0.0  # postive zero
  m = -z   # minus zero
  a(z, z)   # the result here is actually +0.0
 0.0
  a(z, m)
 3.1415926535897931
  a(m, z)# the result here is actually -0.0
 0.0

This actually returns -0.0 under linux...

  a(m, m)
 -3.1415926535897931
 
 It work like that even on Windows, and these are the results C99's
 754-happy appendix mandates for atan2 applied to signed zeroes.  I've
 even seen a /complaint/ on c.l.py that atan2 doesn't do the same when
 
 z = 0.0
 
 is replaced by
 
 z = 0
 
 That is, at least one person thought it was a bug that integer
 zeroes didn't deliver the same behaviors.
 
 Do people actually rely on this?  I know I don't, but given that more
 than just 2 people have remarked on it seeming to like it, I expect
 that changing this would break /some/ code out there.

Probably!


It surely isn't a big problem though is it?

instead of writing

  if (result == 0.0)
  returned cached_float_0;

we just write something like

  if (memcmp((result, static_zero, sizeof(double)) == 0))
  returned cached_float_0;

Eg the below prints (gcc/linux)

The memcmp() way
1: 0 == 0.0
2: -0 != 0.0
The == way
3: 0 == 0.0
4: -0 == 0.0

#include stdio.h
#include string.h

int main(void)
{
static double zero_value = 0.0;
double result;

printf(The memcmp() way\n);
result = 0.0;
if (memcmp(result, zero_value, sizeof(double)) == 0)
printf(1: %g == 0.0\n, result);
else
printf(1: %g != 0.0\n, result);

result = -0.0;
if (memcmp(result, zero_value, sizeof(double)) == 0)
printf(2: %g == 0.0\n, result);
else
printf(2: %g != 0.0\n, result);

printf(The == way\n);
result = 0.0;
if (result == 0.0)
printf(3: %g == 0.0\n, result);
else
printf(3: %g != 0.0\n, result);

result = -0.0;
if (result == 0.0)
printf(4: %g == 0.0\n, result);
else
printf(4: %g != 0.0\n, result);

return 0;
}   

-- 
Nick Craig-Wood [EMAIL PROTECTED] -- http://www.craig-wood.com/nick
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-04 Thread Kristján V . Jónsson

Hm, doesn´t seem to be so for my regular python.

Python 2.3.3 Stackless 3.0 040407 (#51, Apr  7 2004, 19:28:46) [MSC v.1200 32 bi
t (Intel)] on win32
Type help, copyright, credits or license for more information.
 x = -0.0
 y = 0.0
 x,y
(0.0, 0.0)
 

maybe it is 2.3.3, or maybe it is stackless from back then.
K

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] 
 On Behalf Of Martin v. Löwis
 Sent: 3. október 2006 17:56
 To: [EMAIL PROTECTED]
 Cc: Nick Maclaren; python-dev@python.org
 Subject: Re: [Python-Dev] Caching float(0.0)
 
 [EMAIL PROTECTED] schrieb:
  If C90 doesn't distinguish -0.0 and +0.0, how can Python?  Can you 
  give a simple example where the difference between the two 
 is apparent 
  to the Python programmer?
 
 Sure:
 
 py x=-0.0
 py y=0.0
 py x,y
 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-04 Thread Nick Maclaren
On Wed, Oct 04, 2006 at 12:42:04AM -0400, Tim Peters wrote:

  If C90 doesn't distinguish -0.0 and +0.0, how can Python?
 
  Can you give a simple example where the difference between the two
  is apparent to the Python programmer?
 
 Perhaps surprsingly, many (well, comparatively many, compared to none
 ) people have noticed that the platform atan2 cares a lot:

Once upon a time, floating-point was used as an approximation to
mathematical real numbers, and anything which was mathematically
undefined in real arithmetic was regarded as an error in floating-
point.  This allowed a reasonable amount of numeric validation,
because the main remaining discrepancy was that floating-point
has only limited precision and range.

Most of the numerical experts that I know of still favour that
approach, and it is the one standardised by the ISO LIA-1, LIA-2
and LIA-3 standards for floating-point arithmetic.

atan2(0.0,0.0) should be an error.

But C99 differs.  While words do not fail me, they are inappropriate
for this mailing list :-(



Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-04 Thread Martin v. Löwis
Alastair Houghton schrieb:
 AFAIK few systems have floating point traps enabled by default (in fact,
 isn't that what IEEE 754 specifies?), because they often aren't very
 useful.  And in the specific case of the Python interpreter, why would
 you ever want them turned on?

That reasoning is irrelevant. If it breaks a few systems, that already
is some systems too many. Python should never crash; and we have no
control over the floating point exception handling in any portable
manner.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-04 Thread Martin v. Löwis
Kristján V. Jónsson schrieb:
 Hm, doesn´t seem to be so for my regular python.
 
 maybe it is 2.3.3, or maybe it is stackless from back then.

It's because you are using Windows. The way -0.0 gets rendered
depends on the platform. As Tim points out, try
math.atan2(0.0, -0.0) vs math.atan2(0.0, 0.0).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-04 Thread Alastair Houghton
On Oct 4, 2006, at 8:14 PM, Martin v. Löwis wrote:

 If it breaks a few systems, that already is some systems too many.  
 Python should never crash; and we have no control over the floating  
 point exception handling in any portable manner.

You're quite right, though there is already plenty of platform  
dependent code in Python for just that purpose (see fpectlmodule.c,  
for instance).

Anyway, all I originally wanted was to point out that using division  
was one possible way to tell the difference that didn't involve  
relying on the representation being IEEE compliant.  It's true that  
there are problems with FP exceptions.

Kind regards,

Alastair.

--
http://alastairs-place.net


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Nick Craig-Wood
On Tue, Oct 03, 2006 at 09:47:03AM +1000, Delaney, Timothy (Tim) wrote:
 This doesn't actually give us a very useful indication of potential
 memory savings. What I think would be more useful is tracking the
 maximum simultaneous count of each value i.e. what the maximum refcount
 would have been if they were shared.

It isn't just memory savings we are playing for.

Even if 0.0 is allocated and de-allocated 10,000 times in a row, there
would be no memory savings by caching its value.

However there would be
a) less allocator overhead - allocation objects is relatively expensive
b) better caching of the value
c) less cache thrashing

I think you'll find that even in the no memory saving case a few
cycles spent on comparison with 0.0 (or maybe a few other values) will
speed up programs.

-- 
Nick Craig-Wood [EMAIL PROTECTED] -- http://www.craig-wood.com/nick
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Nick Craig-Wood
On Mon, Oct 02, 2006 at 07:53:34PM -0500, [EMAIL PROTECTED] wrote:
 Terry Kristján V. Jónsson [EMAIL PROTECTED] wrote:
  Anyway, Skip noted that 50% of all floats are whole numbers between
  -10 and 10 inclusive,
 
 Terry Please, no.  He said something like this about
 Terry *non-floating-point applications* (evidence unspecified, that I
 Terry remember).  But such applications, by definition, usually don't
 Terry have enough floats for caching (or conversion time) to matter too
 Terry much.
 
 Correct.  The non-floating-point application I chose was the one that was
 most immediately available, make test.  Note that I have no proof that
 regrtest.py isn't terribly floating point intensive.  I just sort of guessed
 that it was.

For my application caching 0.0 is by far the most important. 0.0 has
~200,000 references - the next highest reference count is only about ~200.

-- 
Nick Craig-Wood [EMAIL PROTECTED] -- http://www.craig-wood.com/nick
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Fredrik Lundh
Terry Reedy wrote:

 For true floating point measurements (of temperature, for instance), 
 'integral' measurements (which are an artifact of the scale used (degrees F 
 versus C versus K)) should generally be no more common than other realized 
 measurements.

a real-life sensor is of course where the 121.216 in my original post to 
this thread came from.

(note that most real-life sensors involve A/D conversion at some point, 
which means that they provide a limited number of discrete values.  but 
only the code dealing with the source data will be able to make any 
meaningful assumptions about those values.)

I still think it might make sense to special-case float(0.0) (padding, 
default values, etc) inside PyFloat_FromDouble, and possibly also 
float(1.0) (scale factors, unit vectors, normalized max values, etc) 
but everything else is just generalizing from random observations.

adding a few notes to the C API documentation won't hurt either, I 
suppose. (e.g. note that each call to PyFloat_FromDouble may create a 
new floating point object; if you're converting data from some internal 
format to Python floats, it's often more efficient to map directly to 
preallocated shared PyFloat objects, instead of mapping first to float 
or double and then calling PyFloat_FromDouble on that value).

/F

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Nick Maclaren
Terry Reedy [EMAIL PROTECTED] wrote:

 For true floating point measurements (of temperature, for instance), 
 'integral' measurements (which are an artifact of the scale used (degrees F 
 versus C versus K)) should generally be no more common than other realized 
 measurements.

Not quite, but close enough.  A lot of algorithms use a conversion to
integer, or some of the values are actually counts (e.g. in statistics),
which makes them a bit more likely.  Not enough to get excited about,
in general.

 Thirty years ago, a major stat package written in Fortran (BMDP) required 
 that all data be stored as (Fortran 4-byte) floats for analysis.  So a 
 column of yes/no or male/female data would be stored as 0.0/1.0 or perhaps 
 1.0/2.0.  That skewed the distribution of floats.  But Python and, I hope, 
 Python apps, are more modern than that.

And SPSS and Genstat and others - now even Excel 

 Float caching strikes me a a good subject for cookbook recipies, but not, 
 without real data and a willingness to slightly screw some users, for the 
 default core code.

Yes.  It is trivial (if tedious) to add analysis code - the problem
is finding suitable representative applications.  That was always
my difficulty when I was analysing this sort of thing - and still
is when I need to do it!

 Nick Craig-Wood [EMAIL PROTECTED] wrote:
 
 For my application caching 0.0 is by far the most important. 0.0 has
 ~200,000 references - the next highest reference count is only about ~200.

Yes.  All the experience I have ever seen over the past 4 decades
confirms that is the normal case, with the exception of floating-point
representations that have a missing value indicator.

Even in IEEE 754, infinities and NaN are rare unless the application
is up the spout.  There are claims that a lot of important ones have
a lot of NaNs and use them as missing values but, despite repeated
requests, none of the people claiming that have ever provided an
example.  There are some pretty solid grounds for believing that
those claims are not based in fact, but are polemic.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Kristján V . Jónsson
But that is precisely the point.  A non-floating point application tends to use 
floating point values in a predictable way, with a lot of integral values 
floating around and lots of zeroes.  As this constitutes the majority of python 
applications (okay, daring assumption here) it seems to warrant some 
consideration.

In one of my first messages on the subject I promised to report refcounts of 
-1.0, 0.0 and 1.0 for the EVE server as being.  I didn't but instead gave you 
the frequency of the values reported.  Well , now I can provide you with 
refcounts for the [-10, 10] range plus the total float count, of a server that 
has just started up:

-10,0   589
-9,056
-8,065
-7,063
-6,0243
-5,0731
-4,0550
-3,0246
-2,0246
-1,01096
0,0 195446
1,0 79382
2,0 9650
3,0 6224
4,0 5223
5,0 14766
6,0 2616
7,0 1303
8,0 3307
9,0 1447
10,08102
total:  331351

The total count of floating point numbers allocated at this point is 985794.
Without the reuse, they would be 1317145, so this is a saving of 25%, and of 
5Mb.

Kristján 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] 
 On Behalf Of [EMAIL PROTECTED]
 Sent: 3. október 2006 00:54
 To: Terry Reedy
 Cc: python-dev@python.org
 Subject: Re: [Python-Dev] Caching float(0.0)
 
 
 Terry Kristján V. Jónsson [EMAIL PROTECTED] wrote:
  Anyway, Skip noted that 50% of all floats are whole 
 numbers between
  -10 and 10 inclusive,
 
 Terry Please, no.  He said something like this about
 Terry *non-floating-point applications* (evidence 
 unspecified, that I
 Terry remember).  But such applications, by definition, 
 usually don't
 Terry have enough floats for caching (or conversion 
 time) to matter too
 Terry much.
 
 Correct.  The non-floating-point application I chose was the 
 one that was most immediately available, make test.  Note 
 that I have no proof that regrtest.py isn't terribly floating 
 point intensive.  I just sort of guessed that it was.
 
 Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Nick Maclaren
=?iso-8859-1?Q?Kristj=E1n_V=2E_J=F3nsson?= [EMAIL PROTECTED] wrote:

 The total count of floating point numbers allocated at this point is 985794.
 Without the reuse, they would be 1317145, so this is a saving of 25%, and
 of 5Mb.

And, if you optimised just 0.0, you would get 60% of that saving at
a small fraction of the cost and considerably greater generality.
It isn't clear whether the effort justifies doing more.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread skip

 The total count of floating point numbers allocated at this point is
 985794.  Without the reuse, they would be 1317145, so this is a
 saving of 25%, and of 5Mb.

Nick And, if you optimised just 0.0, you would get 60% of that saving
Nick at a small fraction of the cost and considerably greater
Nick generality.  It isn't clear whether the effort justifies doing
Nick more.

Doesn't that presume that optimizing just 0.0 could be done easily?  Suppose
0.0 is generated all over the place in EVE?

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Martin v. Löwis
Nick Craig-Wood schrieb:
 Even if 0.0 is allocated and de-allocated 10,000 times in a row, there
 would be no memory savings by caching its value.
 
 However there would be
 a) less allocator overhead - allocation objects is relatively expensive
 b) better caching of the value
 c) less cache thrashing
 
 I think you'll find that even in the no memory saving case a few
 cycles spent on comparison with 0.0 (or maybe a few other values) will
 speed up programs.

Can you demonstrate that speedup?

It is quite difficult to anticipate the performance impact of a change,
in particular if there is no change in computational complexity. Various
effects tend to balance out each other.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Martin v. Löwis
Nick Maclaren schrieb:
 The total count of floating point numbers allocated at this point is 985794.
 Without the reuse, they would be 1317145, so this is a saving of 25%, and
 of 5Mb.
 
 And, if you optimised just 0.0, you would get 60% of that saving at
 a small fraction of the cost and considerably greater generality.

As Michael Hudson observed, this is difficult to implement, though:
You can't distinguish between -0.0 and +0.0 easily, yet you should.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Nick Maclaren
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= [EMAIL PROTECTED] wrote:
 
  The total count of floating point numbers allocated at this point is 
  985794.
  Without the reuse, they would be 1317145, so this is a saving of 25%, and
  of 5Mb.
  
  And, if you optimised just 0.0, you would get 60% of that saving at
  a small fraction of the cost and considerably greater generality.
 
 As Michael Hudson observed, this is difficult to implement, though:
 You can't distinguish between -0.0 and +0.0 easily, yet you should.

That was the point of a previous posting of mine in this thread :-(

You shouldn't, despite what IEEE 754 says, at least if you are
allowing for either portability or numeric validation.

There are a huge number of good reasons why IEEE 754 signed zeroes
fit extremely badly into any normal programming language and are
seriously incompatible with numeric validation, but Python adds more.
Is there any other type where there are two values that are required
to be different, but where both the hash is required to be zero and
both are required to evaluate to False in truth value context?


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Martin v. Löwis
Nick Maclaren schrieb:
 That was the point of a previous posting of mine in this thread :-(
 
 You shouldn't, despite what IEEE 754 says, at least if you are
 allowing for either portability or numeric validation.
 
 There are a huge number of good reasons why IEEE 754 signed zeroes
 fit extremely badly into any normal programming language and are
 seriously incompatible with numeric validation, but Python adds more.
 Is there any other type where there are two values that are required
 to be different, but where both the hash is required to be zero and
 both are required to evaluate to False in truth value context?

Ah, you are proposing a semantic change, then: -0.0 will become
unrepresentable, right?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Nick Maclaren
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= [EMAIL PROTECTED] wrote:

 Ah, you are proposing a semantic change, then: -0.0 will become
 unrepresentable, right?

Well, it is and it isn't.

Python currently supports only some of IEEE 754, and that is more by
accident than design - because that is exactly what C90 implementations
do!  There is code in floatobject.c that assumes IEEE 754, but Python
does NOT attempt to support it in toto (it is not clear if it could),
not least because it uses C90.

And, as far as I know, none of that is in the specification, because
Python is at least in theory portable to systems that use other
arithmetics and there is no current way to distinguish -0.0 from 0.0
except by comparing their representations!  And even THAT depends
entirely on whether the C library distinguishes the cases, as far
as I can see.

So distinguishing -0.0 from 0.0 isn't really in Python's current
semantics at all.  And, for reasons that we could go into, I assert
that it should not be - which is NOT the same as not supporting
branch cuts in cmath.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Martin v. Löwis
Nick Maclaren schrieb:
 So distinguishing -0.0 from 0.0 isn't really in Python's current
 semantics at all.  And, for reasons that we could go into, I assert
 that it should not be - which is NOT the same as not supporting
 branch cuts in cmath.

Are you talking about Python the language specification or
Python the implementation here? It is not a change to the language
specification, as this aspect of the behavior (as you point out)
is unspecified. However, it is certainly a change to the observable
behavior of the Python implementation, and no amount of arguing
can change that.

Regards,
Martin

P.S. For that matter, *any* kind of changes to the singleton nature
of certain immutable values is a change in semantics. It's just that
dropping -0.0 is an *additional* change (on top of the change that
1.0-1.0 is 0.0 would change from False to True).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Nicko van Someren
On 3 Oct 2006, at 15:10, Martin v. Löwis wrote:

 Nick Maclaren schrieb:
 That was the point of a previous posting of mine in this thread :-(

 You shouldn't, despite what IEEE 754 says, at least if you are
 allowing for either portability or numeric validation.

 There are a huge number of good reasons why IEEE 754 signed zeroes
 fit extremely badly into any normal programming language and are
 seriously incompatible with numeric validation, but Python adds more.
 Is there any other type where there are two values that are required
 to be different, but where both the hash is required to be zero and
 both are required to evaluate to False in truth value context?

 Ah, you are proposing a semantic change, then: -0.0 will become
 unrepresentable, right?

It's only a semantic change on platforms that happen to use IEEE  
754 float representations, or some other representation that exposes  
the sign of zero.  The Python docs have for many years stated with  
regard to the float type: All bets on their precision are off unless  
you happen to know the machine you are working with. and that You  
are at the mercy of the underlying machine architecture  Not all  
floating point representations support sign of zero, though in the  
modern world it's true that the vast majority do.

It would be instructive to understand how much, if any, python code  
would break if we lost -0.0.  I'm do not believe that there is any  
reliable way for python code to tell the difference between all of  
the different types of IEEE 754 zeros and in the special case of -0.0  
the best test I can come up with is repr(n)[0]=='-'.  Is there an  
compelling case, to do with compatibility or otherwise, for exposing  
the sign of a zero?  It seems like a numerical anomaly to me.

Nicko

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread James Y Knight
On Oct 3, 2006, at 8:30 AM, Martin v. Löwis wrote:
 As Michael Hudson observed, this is difficult to implement, though:
 You can't distinguish between -0.0 and +0.0 easily, yet you should.

Of course you can. It's absolutely trivial. The only part that's even  
*the least bit* sketchy in this is assuming that a double is 64 bits.  
Practically speaking, that is true on all architectures I know of,  
and if it's not guaranteed, it could easily be a 'configure' time check.

typedef union {
 double d;
 uint64_t i;
} rawdouble;

int isposzero(double a) {
 rawdouble zero;
 zero.d = 0.0;
 rawdouble aa;
 aa.d = a;
 return aa.i == zero.i;
}

int main() {
 if (sizeof(double) != sizeof(uint64_t))
 return 1;

 printf(%d\n, isposzero(0.0));
 printf(%d\n, isposzero(-0.0));

}

James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Martin v. Löwis
Nicko van Someren schrieb:
 It's only a semantic change on platforms that happen to use IEEE  
 754 float representations, or some other representation that exposes  
 the sign of zero.

Right. Later, you admit that this is vast majority of modern machines.

 It would be instructive to understand how much, if any, python code  
 would break if we lost -0.0.  I'm do not believe that there is any  
 reliable way for python code to tell the difference between all of  
 the different types of IEEE 754 zeros and in the special case of -0.0  
 the best test I can come up with is repr(n)[0]=='-'.  Is there an  
 compelling case, to do with compatibility or otherwise, for exposing  
 the sign of a zero?  It seems like a numerical anomaly to me.

I think it is reasonable to admit that
a) this change is a change in semantics for the majority of the
   machines
b) it is likely that this change won't affect a significant number
   of applications (I'm pretty sure someone will notice, though;
   someone always notices).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread skip

Martin However, it is certainly a change to the observable behavior of
Martin the Python implementation, and no amount of arguing can change
Martin that.

If C90 doesn't distinguish -0.0 and +0.0, how can Python?  Can you give a
simple example where the difference between the two is apparent to the
Python programmer?

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread skip

Martin b) it is likely that this change won't affect a significant
Martinnumber of applications (I'm pretty sure someone will notice,
Martinthough; someone always notices).

+1 QOTF.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Scott David Daniels
James Y Knight wrote:
 On Oct 3, 2006, at 8:30 AM, Martin v. Löwis wrote:
 As Michael Hudson observed, this is difficult to implement, though:
 You can't distinguish between -0.0 and +0.0 easily, yet you should.
 
 Of course you can. It's absolutely trivial. The only part that's even  
 *the least bit* sketchy in this is assuming that a double is 64 bits.  
 Practically speaking, that is true on all architectures I know of,  
 and if it's not guaranteed, it could easily be a 'configure' time check.
 
 typedef union {
  double d;
  uint64_t i;
 } rawdouble;
 
 int isposzero(double a) {
  rawdouble zero;
  zero.d = 0.0;
  rawdouble aa;
  aa.d = a;
  return aa.i == zero.i;
 }
 
 int main() {
  if (sizeof(double) != sizeof(uint64_t))
  return 1;
 
  printf(%d\n, isposzero(0.0));
  printf(%d\n, isposzero(-0.0));
 
 }
 

And you should be able to cache the single positive zero
with something vaguely like:
 PyObject *
 PyFloat_FromDouble(double fval)
 {   ...
 if (fval == 0.0  raw_match(fval, cached)) {
 PY_INCREF(cached);
 return cached;
 }
 ...

-- 
-- Scott David Daniels
[EMAIL PROTECTED]

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Martin v. Löwis
[EMAIL PROTECTED] schrieb:
 If C90 doesn't distinguish -0.0 and +0.0, how can Python?  Can you give a
 simple example where the difference between the two is apparent to the
 Python programmer?

Sure:

py x=-0.0
py y=0.0
py x,y
(-0.0, 0.0)
py hash(x),hash(y)
(0, 0)
py x==y
True
py str(x)==str(y)
False
py str(x),str(y)
('-0.0', '0.0')
py float(str(x)),float(str(y))
(-0.0, 0.0)

Imagine an application that reads floats from a text file, manipulates
some of them, and then writes back the complete list of floats. Further
assume that somehow, -0.0 got into the file. Currently, the sign
round-trips; under the proposed change, it would stop doing so.
Of course, there likely wouldn't be any real change to value, as
the sign of 0 is likely of no significance to the application.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Nick Maclaren
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= [EMAIL PROTECTED] wrote:

 py x=-0.0
 py y=0.0
 py x,y

Nobody is denying that SOME C90 implementations distinguish them,
but it is no part of the standard - indeed, a C90 implementation is
permitted to use ANY criterion for deciding when to display -0.0 and
0.0.  C99 is ambiguous to the point of internal inconsistency, except
when __STDC_IEC_559__ is set to 1, though the intent is clear.

And my reading of Python's code is that it relies on C's handling
of such values.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread James Y Knight

On Oct 3, 2006, at 2:26 PM, Nick Maclaren wrote:

 =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= [EMAIL PROTECTED] wrote:

 py x=-0.0
 py y=0.0
 py x,y

 Nobody is denying that SOME C90 implementations distinguish them,
 but it is no part of the standard - indeed, a C90 implementation is
 permitted to use ANY criterion for deciding when to display -0.0 and
 0.0.  C99 is ambiguous to the point of internal inconsistency, except
 when __STDC_IEC_559__ is set to 1, though the intent is clear.

 And my reading of Python's code is that it relies on C's handling
 of such values.

This is a really poor argument. Python should be moving *towards*  
proper '754 fp support, not away from it. On the platforms that are  
most important, the C implementations distinguish positive and  
negative 0. That the current python implementation may be defective  
when the underlying C implementation is defective doesn't excuse a  
change to intentionally break python on the common platforms.

IEEE 754 is so widely implemented that IMO it would make sense to  
make Python's floating point specify it, and simply declare floating  
point operations on non-IEEE 754 machines as use at own risk, may  
not conform to python language standard. (or if someone wants to use  
a software fp library for such machines, that's fine too).

James

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Martin v. Löwis
Nick Maclaren schrieb:
 py x=-0.0
 py y=0.0
 py x,y
 
 Nobody is denying that SOME C90 implementations distinguish them,
 but it is no part of the standard - indeed, a C90 implementation is
 permitted to use ANY criterion for deciding when to display -0.0 and
 0.0.  C99 is ambiguous to the point of internal inconsistency, except
 when __STDC_IEC_559__ is set to 1, though the intent is clear.
 
 And my reading of Python's code is that it relies on C's handling
 of such values.

So what is your conclusion? That applications will not break?

People don't care that their code may break on a different platform,
if they aren't using these platforms.
They care if it breaks on their platform just because they use a
new Python version.

(Of course, they sometimes also complain that Python behaves
differently on different platforms, and cannot really accept
the explanation that the language didn't guarantee the same
behavior on all systems. This explanation doesn't help them:
they still need to modify the application).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Gregory P. Smith
  It would be instructive to understand how much, if any, python code  
  would break if we lost -0.0.  I'm do not believe that there is any  
  reliable way for python code to tell the difference between all of  
  the different types of IEEE 754 zeros and in the special case of -0.0  
  the best test I can come up with is repr(n)[0]=='-'.  Is there an  
  compelling case, to do with compatibility or otherwise, for exposing  
  the sign of a zero?  It seems like a numerical anomaly to me.
 
 I think it is reasonable to admit that
 a) this change is a change in semantics for the majority of the
machines
 b) it is likely that this change won't affect a significant number
of applications (I'm pretty sure someone will notice, though;
someone always notices).

If you're really going to bother doing this rather than just adding a
note in the docs about testing for and reusing the most common float
values to save memory when instantiating them from external input:

Just do a binary comparison of the float with predefined + and - 0.0
float values or any other special values that you wish to catch rather
than a floating point comparison. 

-g

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Alastair Houghton
On 3 Oct 2006, at 17:47, James Y Knight wrote:

 On Oct 3, 2006, at 8:30 AM, Martin v. Löwis wrote:
 As Michael Hudson observed, this is difficult to implement, though:
 You can't distinguish between -0.0 and +0.0 easily, yet you should.

 Of course you can. It's absolutely trivial. The only part that's even
 *the least bit* sketchy in this is assuming that a double is 64 bits.
 Practically speaking, that is true on all architectures I know of,

How about doing 1.0 / x, where x is the number you want to test?  On  
systems with sane semantics, it should result in an infinity, the  
sign of which should depend on the sign of the zero.  While I'm sure  
there are any number of places where it will break, on those  
platforms it seems to me that you're unlikely to care about the  
difference between +0.0 and -0.0 anyway, since it's hard to otherwise  
distinguish them.

e.g.

   double value_to_test;

   ...

   if (value_to_test == 0.0) {
 double my_inf = 1.0 / value_to_test;

 if (my_inf  0.0) {
   /* We have a -ve zero */
 } else if (my_inf  0.0) {
   /* We have a +ve zero */
 } else {
   /* This platform might not support infinities (though we might  
get a
  signal or something rather than getting here in that  
case...) */
 }
   }

(I should add that presently I've only tried it on a PowerPC, because  
it's late and that's what's in front of me.  It seems to work OK here.)

Kind regards,

Alastair

--
http://alastairs-place.net

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Josiah Carlson

Alastair Houghton [EMAIL PROTECTED] wrote:
 On 3 Oct 2006, at 17:47, James Y Knight wrote:
 
  On Oct 3, 2006, at 8:30 AM, Martin v. Löwis wrote:
  As Michael Hudson observed, this is difficult to implement, though:
  You can't distinguish between -0.0 and +0.0 easily, yet you should.
 
  Of course you can. It's absolutely trivial. The only part that's even
  *the least bit* sketchy in this is assuming that a double is 64 bits.
  Practically speaking, that is true on all architectures I know of,
 
 How about doing 1.0 / x, where x is the number you want to test?  On  
 systems with sane semantics, it should result in an infinity, the  
 sign of which should depend on the sign of the zero.  While I'm sure  
 there are any number of places where it will break, on those  
 platforms it seems to me that you're unlikely to care about the  
 difference between +0.0 and -0.0 anyway, since it's hard to otherwise  
 distinguish them.

There is, of course, the option of examining their representations in
memory (I described the general technique in another posting on this
thread).  From what I understand of IEEE 764 FP doubles, -0.0 and +0.0
have different representations, and if we look at the underlying
representation (perhaps by a *((uint64*)(float_input))), we can
easily distinguish all values we want to cache...

We can observe it directly, for example on x86:
 import struct
 struct.pack('d', -0.0)
'\x00\x00\x00\x00\x00\x00\x00\x80'
 struct.pack('d', 0.0)
'\x00\x00\x00\x00\x00\x00\x00\x00'


And as I stated before, we can switch on those values.  Alternatively,
if we can't switch on the 64 bit values directly...

uint32* p = (uint32*)(double_input)
if (!p[0]) { /* p[1] on big-endian platforms */
switch p[1] { /* p[0] on big-endian platforms */
...
}
}


 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Steve Holden
Josiah Carlson wrote:
[yet more on this topic]

If the brainpower already expended on this issue were proportional to 
its significance then we'd be reading about it on CNN news.

This thread has disappeared down a rat-hole, never to re-emerge with 
anything of significant benefit to users. C'mon, guys, implement a patch 
or leave it alone :-)

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Guido van Rossum
On 10/3/06, Steve Holden [EMAIL PROTECTED] wrote:
 If the brainpower already expended on this issue were proportional to
 its significance then we'd be reading about it on CNN news.

 This thread has disappeared down a rat-hole, never to re-emerge with
 anything of significant benefit to users. C'mon, guys, implement a patch
 or leave it alone :-)

Hear, hear.

My proposal: only cache positive 0.0. My prediction: biggest bang for
the buck, nobody's code will break. On platforms that don't
distinguish between +/- 0.0, of course this would cache all zeros. On
platforms that do distinguish them, -0.0 is left alone, which is just
fine.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Tim Peters
[EMAIL PROTECTED]
 If C90 doesn't distinguish -0.0 and +0.0, how can Python?

With liberal applications of piss  vinegar ;-)

 Can you give a simple example where the difference between the two is apparent
 to the Python programmer?

Perhaps surprsingly, many (well, comparatively many, compared to none
) people have noticed that the platform atan2 cares a lot:

 from math import atan2 as a
 z = 0.0  # postive zero
 m = -z   # minus zero
 a(z, z)   # the result here is actually +0.0
0.0
 a(z, m)
3.1415926535897931
 a(m, z)# the result here is actually -0.0
0.0
 a(m, m)
-3.1415926535897931

It work like that even on Windows, and these are the results C99's
754-happy appendix mandates for atan2 applied to signed zeroes.  I've
even seen a /complaint/ on c.l.py that atan2 doesn't do the same when

z = 0.0

is replaced by

z = 0

That is, at least one person thought it was a bug that integer
zeroes didn't deliver the same behaviors.

Do people actually rely on this?  I know I don't, but given that more
than just 2 people have remarked on it seeming to like it, I expect
that changing this would break /some/ code out there.

BTW, on /some/ platforms all those examples trigger EDOM from the
platform libm instead -- which is also fine by C99, for
implementations ignoring C99's optional 754-happy appendix.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Tim Peters
[EMAIL PROTECTED]
 Can you give a simple example where the difference between the two is apparent
 to the Python programmer?

BTW, I don't recall the details and don't care enough to reconstruct
them, but when Python's front end was first changed to recognize
negative literals, it treated +0.0 and -0.0 the same, and we did get
bug reports as a result.

A bit more detail, because it's necessary to understand that even
minimally.  Python's grammar doesn't have negative numeric literals;
e.g., according to the grammar,

-1
and
-1.1

are applications of the unary minus operator to the positive numeric
literals 1 and 1.1.  And for years Python generated code accordingly:
LOAD_CONST followed by the unary minus opcode.

Someone (Fred, I think) introduced a front-end optimization to
collapse that to plain LOAD_CONST, doing the negation at compile time.

The code object contains a vector of compile-time constants, and the
optimized code initially didn't distinguish between +0.0 and -0.0.  As
a result, if the first float 0.0 in a code block looked postive,
/all/ float zeroes in the code block were in effect treated as
positive; and similarly if the first float zero was -0.0, all float
zeroes were in effect treated as negative.

That did break code.  IIRC, it was fixed by special-casing the snot
out of -0.0, leaving that single case as a LOAD_CONST followed by
UNARY_NEGATIVE.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Fred L. Drake, Jr.
On Wednesday 04 October 2006 00:53, Tim Peters wrote:
  Someone (Fred, I think) introduced a front-end optimization to
  collapse that to plain LOAD_CONST, doing the negation at compile time.

I did the original change to make negative integers use just LOAD_CONST, but I 
don't think I changed what was generated for float literals.  That could be 
my memory going bad, though.

The code changed several times as people with more numeric-fu that myself 
fixed all sorts of border cases.  I've tried really hard to stay away from 
the code generator since then.  :-)


  -Fred

-- 
Fred L. Drake, Jr.   fdrake at acm.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Tim Peters
[Tim]
 Someone (Fred, I think) introduced a front-end optimization to
 collapse that to plain LOAD_CONST, doing the negation at compile time.

 I did the original change to make negative integers use just LOAD_CONST, but I
 don't think I changed what was generated for float literals.  That could be
 my memory going bad, though.

It is ;-)  Here under Python 2.2.3:

 from dis import dis
 def f(): return 0.0 + -0.0 + 1.0 + -1.0
...
 dis(f)
  0 SET_LINENO   1

  3 SET_LINENO   1
  6 LOAD_CONST   1 (0.0)
  9 LOAD_CONST   1 (0.0)
 12 UNARY_NEGATIVE
 13 BINARY_ADD
 14 LOAD_CONST   2 (1.0)
 17 BINARY_ADD
 18 LOAD_CONST   3 (-1.0)
 21 BINARY_ADD
 22 RETURN_VALUE
 23 LOAD_CONST   0 (None)
 26 RETURN_VALUE

Note there that 0.0, 1.0, and -1.0 were all treated as literals,
but that -0.0 still triggered a UNARY_NEGATIVE opcode.  That was
after the fix.

You don't remember this as well as I do since I probably had to fix
it, /and/ I ate enormous quantities of chopped, pressed, smoked,
preservative-laden bag o' ham at the time.  You really need to do both
to remember floating-point trivia.  Indeed, since I gave up my bag o'
ham habit, I hardly ever jump into threads about fp trivia anymore.
Mostly it's because I'm too weak from not eating anything, though --
how about lunch tomorrow?

 The code changed several times as people with more numeric-fu that myself
 fixed all sorts of border cases.  I've tried really hard to stay away from
 the code generator since then.  :-)

Successfully, too!  It's admirable.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Josiah Carlson

Steve Holden [EMAIL PROTECTED] wrote:
 Josiah Carlson wrote:
 [yet more on this topic]
 
 If the brainpower already expended on this issue were proportional to 
 its significance then we'd be reading about it on CNN news.

Goodness, I wasn't aware that pointer manipulation took that much
brainpower.  I presume you mean what others have spent time thinking
about with regards to this topic.


 This thread has disappeared down a rat-hole, never to re-emerge with 
 anything of significant benefit to users. C'mon, guys, implement a patch 
 or leave it alone :-)

Heh.  So be it.  The following is untested (I lack a build system for
the Python trunk).  It adds a new global cache for floats, a new 'fill
the global cache' function, and an updated PyFloat_FromDouble() function.

All in all, it took about 10 minutes to generate, and understands the
difference between fp +0.0 and -0.0 (assuming sane IEEE 754 fp double
behavior on non-x86 platforms).

 - Josiah


/* This should go into floatobject.c */


static PyFloatObject *cached_list = NULL;

static PyFloatObject *
fill_cached_list(void)
{
cached_list = (PyFloatObject *) 1;
PyFloatObject *p;
int i;
p = (PyFloatObject *) PyMem_MALLOC(sizeof(PyFloatObject)*22);
if (p == NULL) {
cached_list = NULL;
return (PyFloatObject *) PyErr_NoMemory();
}
for (i=0;i=10;i++) {
p[i] = (PyFloatObject*) PyFloat_fromDouble((double) i);
p[21-i] = (PyFloatObject*) PyFloat_fromDouble(-((double) i));
}
cached_list = NULL;
return p;
}

PyObject *
PyFloat_FromDouble(double fval)
{
register PyFloatObject *op;
register long* fvali = (int*)(fval);
if (free_list == NULL) {
if ((free_list = fill_free_list()) == NULL)
return NULL;
}

#ifdef LITTLE_ENDIAN
if (!p[0])
#else
if (!p[1])
#endif
{
if (cached_list == NULL) {
if ((cached_list = fill_cached_list()) == NULL)
return NULL;
}
if ((cached_list != 1)  (cached_list != NULL)) {
#ifdef LITTLE_ENDIAN
switch p[1]
#else
switch p[0]
#endif
{
case 0:  PY_INCREF(cached_list[0]); return 
cached_list[0];
case 1072693248:  PY_INCREF(cached_list[1]); return 
cached_list[1];
case 1073741824:  PY_INCREF(cached_list[2]); return 
cached_list[2];
case 1074266112:  PY_INCREF(cached_list[3]); return 
cached_list[3];
case 1074790400:  PY_INCREF(cached_list[4]); return 
cached_list[4];
case 1075052544:  PY_INCREF(cached_list[5]); return 
cached_list[5];
case 1075314688:  PY_INCREF(cached_list[6]); return 
cached_list[6];
case 1075576832:  PY_INCREF(cached_list[7]); return 
cached_list[7];
case 1075838976:  PY_INCREF(cached_list[8]); return 
cached_list[8];
case 1075970048:  PY_INCREF(cached_list[9]); return 
cached_list[9];
case 1076101120:  PY_INCREF(cached_list[10]); return 
cached_list[10];
case -1071382528:  PY_INCREF(cached_list[11]); return 
cached_list[11];
case -1071513600:  PY_INCREF(cached_list[12]); return 
cached_list[12];
case -1071644672:  PY_INCREF(cached_list[13]); return 
cached_list[13];
case -1071906816:  PY_INCREF(cached_list[14]); return 
cached_list[14];
case -1072168960:  PY_INCREF(cached_list[15]); return 
cached_list[15];
case -1072431104:  PY_INCREF(cached_list[16]); return 
cached_list[16];
case -1072693248:  PY_INCREF(cached_list[17]); return 
cached_list[17];
case -1073217536:  PY_INCREF(cached_list[18]); return 
cached_list[18];
case -1073741824:  PY_INCREF(cached_list[19]); return 
cached_list[19];
case -1074790400:  PY_INCREF(cached_list[20]); return 
cached_list[20];
case -2147483648:  PY_INCREF(cached_list[21]); return 
cached_list[21];
default:
}
}

}

/* Inline PyObject_New */
op = free_list;
free_list = (PyFloatObject *)op-ob_type;
PyObject_INIT(op, PyFloat_Type);
op-ob_fval = fval;
return (PyObject *) op;
}

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-03 Thread Martin v. Löwis
Alastair Houghton schrieb:
 On 3 Oct 2006, at 17:47, James Y Knight wrote:
 
 On Oct 3, 2006, at 8:30 AM, Martin v. Löwis wrote:
 As Michael Hudson observed, this is difficult to implement, though:
 You can't distinguish between -0.0 and +0.0 easily, yet you should.

 Of course you can. It's absolutely trivial. The only part that's even
 *the least bit* sketchy in this is assuming that a double is 64 bits.
 Practically speaking, that is true on all architectures I know of,
 
 How about doing 1.0 / x, where x is the number you want to test?  

This is a bad idea. It may cause a trap, leading to program termination.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-02 Thread Nick Craig-Wood
On Sun, Oct 01, 2006 at 02:01:51PM -0400, Jean-Paul Calderone wrote:
 Each line in an interactive session is compiled separately, like modules
 are compiled separately.  With the current implementation, literals in a
 single compilation unit have a chance to be cached like this.  Literals
 in different compilation units, even for the same value, don't.

That makes sense - thanks for the explanation!

-- 
Nick Craig-Wood [EMAIL PROTECTED] -- http://www.craig-wood.com/nick
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-02 Thread Martin v. Löwis
Nick Coghlan schrieb:
 Right. Although I do wonder what kind of software people write to run
 into this problem. As Guido points out, the numbers must be the result
 from some computation, or created by an extension module by different
 means. If people have many *simultaneous* copies of 0.0, I would expect
 there is something else really wrong with the data structures or
 algorithms they use.
 
 I suspect the problem would typically stem from floating point values
 that are read in from a human-readable file rather than being the result
 of a 'calculation' as such:

That's how you can end up with 100 different copies of 0.0. But
apparently, people are creating millions of them, and keep them in
memory simultaneously. Unless the text file *only* consists of floating
point numbers, I would expect they have bigger problems than that.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-02 Thread Martin v. Löwis
Kristján V. Jónsson schrieb:
 Well, a lot of extension code, like ours use PyFloat_FromDouble(foo);
 This can be from vectors and stuff.

Hmm. If you get a lot of 0.0 values from vectors and stuff, I would
expect that memory usage is already high.

In any case, a module that creates a lot of copies of 0.0 that way
could do its own caching, right?

 Very often these are values from a database.  Integral float values
 are very common in such case and id didn't occur to me that they
 weren't being reused, at least for small values.

Sure - but why are keeping people them in memory all the time?
Also, isn't it a mis-design of the database if you have many float
values in it that represent natural numbers? Shouldn't you use
a more appropriate data type, then?

 Also, a lot of arithmetic involving floats is expected to end in
 integers, like computing some index from a float value.  Integers get
 promoted to floats when touched by them, as you know.

Again, sounds like a programming error to me.

 Anyway, I now precreate integral values from -10 to 10 with great
 effect.  The cost is minimal, the benefit great.

In an extension module, the knowledge about the application domain
is larger, so it may be reasonable to do the caching there. I would
still expect that in the typical application where this is an issue,
there is some kind of larger design bug.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-02 Thread Martin v. Löwis
Kristján V. Jónsson schrieb:
 I can't see how this situation is any different from the re-use of
 low ints.  There is no fundamental law that says that ints below 100
 are more common than other, yet experience shows that  this is so,
 and so they are reused.

There are two important differences:
1. it is possible to determine whether the value is special in
   constant time, and also fetch the singleton value in constant
   time for ints; the same isn't possible for floats.
2. it may be that there is a loss of precision in reusing an existing
   value (although I'm not certain that this could really happen).
   For example, could it be that two values compare successful in
   ==, yet are different values? I know this can't happen for
   integers, so I feel much more comfortable with that cache.

 Rather than to view this as a programming error, why not simply
 accept that this is a recurring pattern and adjust python to be more
 efficient when faced by it?  Surely a lot of karma lies that way?

I'm worried about the penalty that this causes in terms of run-time
cost. Also, how do you chose what values to cache?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-02 Thread Kristján V . Jónsson
I see, you are thinking of the general fractional case.
My point was that whole numbers seem to pop up often and to reuse those is easy 
I did a test of tracking actual floating point numbers and the majority of 
heavy usage comes
from integral values.  It would indeed be strange if some fractional number 
were heavily use but it can be argued that integral ones are special in many 
ways.
Anyway, Skip noted that 50% of all floats are whole numbers between -10 and 10 
inclusive, and this is the code that I employ in our python build today:

PyObject *
PyFloat_FromDouble(double fval)
{
register PyFloatObject *op;
int ival;
if (free_list == NULL) {
if ((free_list = fill_free_list()) == NULL)
return NULL;
/* CCP addition, cache common values */
if (!f_reuse[0]) {
int i;
for(i = 0; i21; i++)
f_reuse[i] = PyFloat_FromDouble((double)(i-10));
}
}
/* CCP addition, check for recycling */
ival = (int)fval;
if ((double)ival == fval  ival=-10  ival = 10) {
ival+=10;
if (f_reuse[ival]) {
Py_INCREF(f_reuse[ival]);
return f_reuse[ival];
}
}
...


Cheers,

Kristján

 -Original Message-
 From: Martin v. Löwis [mailto:[EMAIL PROTECTED] 
 Sent: 2. október 2006 14:37
 To: Kristján V. Jónsson
 Cc: Bob Ippolito; python-dev@python.org
 Subject: Re: [Python-Dev] Caching float(0.0)
 
 Kristján V. Jónsson schrieb:
  I can't see how this situation is any different from the 
 re-use of low 
  ints.  There is no fundamental law that says that ints 
 below 100 are 
  more common than other, yet experience shows that  this is 
 so, and so 
  they are reused.
 
 There are two important differences:
 1. it is possible to determine whether the value is special in
constant time, and also fetch the singleton value in constant
time for ints; the same isn't possible for floats.
 2. it may be that there is a loss of precision in reusing an existing
value (although I'm not certain that this could really happen).
For example, could it be that two values compare successful in
==, yet are different values? I know this can't happen for
integers, so I feel much more comfortable with that cache.
 
  Rather than to view this as a programming error, why not 
 simply accept 
  that this is a recurring pattern and adjust python to be more 
  efficient when faced by it?  Surely a lot of karma lies that way?
 
 I'm worried about the penalty that this causes in terms of 
 run-time cost. Also, how do you chose what values to cache?
 
 Regards,
 Martin
 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-02 Thread Michael Hudson
Martin v. Löwis [EMAIL PROTECTED] writes:

 Kristján V. Jónsson schrieb:
 I can't see how this situation is any different from the re-use of
 low ints.  There is no fundamental law that says that ints below 100
 are more common than other, yet experience shows that  this is so,
 and so they are reused.

 There are two important differences:
 1. it is possible to determine whether the value is special in
constant time, and also fetch the singleton value in constant
time for ints; the same isn't possible for floats.

I don't think you mean constant time here do you?  I think most of
the code posted so far has been constant time, at least in terms of
instruction count, though some might indeed be fairly slow on some
processors -- conversion from double to integer on the PowerPC
involves a trip off to memory for example.  Even so, everything should
be fairly efficient compared to allocation, even with PyMalloc.

 2. it may be that there is a loss of precision in reusing an existing
value (although I'm not certain that this could really happen).
For example, could it be that two values compare successful in
==, yet are different values? I know this can't happen for
integers, so I feel much more comfortable with that cache.

I think the only case is that the two zeros compare equal, which is
unfortunate given that it's the most compelling value to cache...

I don't know a reliable and fast way to distinguish +0.0 and -0.0.

Cheers,
mwh

-- 
  The bottom tier is what a certain class of wanker would call
  business objects ...  -- Greg Ward, 9 Dec 1999
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-02 Thread Martin v. Löwis
Michael Hudson schrieb:
 1. it is possible to determine whether the value is special in
constant time, and also fetch the singleton value in constant
time for ints; the same isn't possible for floats.
 
 I don't think you mean constant time here do you?  

Right; I really wondered whether the code was dependent or independent
of the number of special-case numbers.

 I think most of
 the code posted so far has been constant time, at least in terms of
 instruction count, though some might indeed be fairly slow on some
 processors -- conversion from double to integer on the PowerPC
 involves a trip off to memory for example.

Kristian's code testing only for integers in a range would be of
that kind. Code that tests for a list of literals determined
at compile time typically needs time linear with the number of
special-cased constants (of course, as that there is a fixed
number of constants, this is O(1)).

 2. it may be that there is a loss of precision in reusing an existing
value (although I'm not certain that this could really happen).
For example, could it be that two values compare successful in
==, yet are different values? I know this can't happen for
integers, so I feel much more comfortable with that cache.
 
 I think the only case is that the two zeros compare equal, which is
 unfortunate given that it's the most compelling value to cache...

Thanks for pointing that out. I can believe this is the only case
in IEEE-754; I also wonder whether alternative implementations
could cause problems (although I don't really worry too much
about VMS).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-02 Thread Aahz
On Mon, Oct 02, 2006, Martin v. L?wis wrote:
 Michael Hudson schrieb:

 I think most of
 the code posted so far has been constant time, at least in terms of
 instruction count, though some might indeed be fairly slow on some
 processors -- conversion from double to integer on the PowerPC
 involves a trip off to memory for example.
 
 Kristian's code testing only for integers in a range would be of
 that kind. Code that tests for a list of literals determined
 at compile time typically needs time linear with the number of
 special-cased constants (of course, as that there is a fixed
 number of constants, this is O(1)).

What if we do this work only on float()?
-- 
Aahz ([EMAIL PROTECTED])   * http://www.pythoncraft.com/

LL YR VWL R BLNG T S  -- www.nancybuttons.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-02 Thread Tim Hochberg
[EMAIL PROTECTED] wrote:\/
 Steve By these statistics I think the answer to the original question
 Steve is clearly no in the general case.
 
 As someone else (Guido?) pointed out, the literal case isn't all that
 interesting.  I modified floatobject.c to track a few interesting
 floating point values:
 
[...code...]
 
 So for a largely non-floating point application, a fair number of floats
 are allocated, a bit over 25% of them are -1.0, 0.0 or +1.0, and nearly 50%
 of them are whole numbers between -10.0 and 10.0, inclusive.
 
 Seems like it at least deserves a serious look.  It would be nice to have
 the numeric crowd contribute to this subject as well.

As a representative of the numeric crowd, I'll say that I've never 
noticed this to be a problem. I suspect that it's a non issue since we 
generally store our numbers in arrays, not big piles of Python floats, 
so there's no opportunity for identical floats to pile up.

-tim

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-02 Thread Delaney, Timothy (Tim)
[EMAIL PROTECTED] wrote:

 Steve By these statistics I think the answer to the original
 question Steve is clearly no in the general case.
 
 As someone else (Guido?) pointed out, the literal case isn't all that
 interesting.  I modified floatobject.c to track a few interesting
 floating point values:
 
 static unsigned int nfloats[5] = {
 0, /* -1.0 */
 0, /*  0.0 */
 0, /* +1.0 */
 0, /* everything else */
 0, /* whole numbers from -10.0 ... 10.0 */
 };
 
 PyObject *
 PyFloat_FromDouble(double fval)
 {
 register PyFloatObject *op;
 if (free_list == NULL) {
 if ((free_list = fill_free_list()) == NULL)
 return NULL;
 }
 
 if (fval == 0.0) nfloats[1]++;
 else if (fval == 1.0) nfloats[2]++;
 else if (fval == -1.0) nfloats[0]++;
 else nfloats[3]++;
 
 if (fval = -10.0  fval = 10.0  (int)fval == fval) {
 nfloats[4]++;
 }

This doesn't actually give us a very useful indication of potential
memory savings. What I think would be more useful is tracking the
maximum simultaneous count of each value i.e. what the maximum refcount
would have been if they were shared.

Tim Delaney
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-02 Thread Terry Reedy

Kristján V. Jónsson [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
Anyway, Skip noted that 50% of all floats are whole numbers between -10 
and 10 inclusive,

Please, no.  He said something like this about *non-floating-point 
applications* (evidence unspecified, that I remember).  But such 
applications, by definition, usually don't have enough floats for caching 
(or conversion time) to matter too much.

For true floating point measurements (of temperature, for instance), 
'integral' measurements (which are an artifact of the scale used (degrees F 
versus C versus K)) should generally be no more common than other realized 
measurements.

Thirty years ago, a major stat package written in Fortran (BMDP) required 
that all data be stored as (Fortran 4-byte) floats for analysis.  So a 
column of yes/no or male/female data would be stored as 0.0/1.0 or perhaps 
1.0/2.0.  That skewed the distribution of floats.  But Python and, I hope, 
Python apps, are more modern than that.

and this is the code that I employ in our python build today:

[snip]

For the analysis of typical floating point data, this is all pointless and 
a complete waste of time.  After a billion comversions or so, I expect the 
extra time might add up to something noticeable.

 From: Martin v. Löwis [mailto:[EMAIL PROTECTED]
 I'm worried about the penalty that this causes in terms of
 run-time cost.

Me too.

 Also, how do you chose what values to cache?

At one time (don't know about today), it was mandatory in some Fortran 
circles to name the small float constants used in a particular program with 
the equivalent of C #defines.  In Python,
zero = 0.0, half = 0.5, one = 1.0, twopi = 6.29..., eee = 2.7..., phi = 
.617..., etc. (Note that naming is not restricted to integral or otherwise 
'nice' values.)

The purpose then was to allow easy conversion from float to double to 
extended double.  And in some cases, it also made the code clearer.  With 
Python, the same procedure would guarantee only one copy (caching) of the 
same floats for constructed data structures.

Float caching strikes me a a good subject for cookbook recipies, but not, 
without real data and a willingness to slightly screw some users, for the 
default core code.

Terry Jan Reedy



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-02 Thread skip

Tim This doesn't actually give us a very useful indication of potential
Tim memory savings. What I think would be more useful is tracking the
Tim maximum simultaneous count of each value i.e. what the maximum
Tim refcount would have been if they were shared.

Most definitely.  I just posted what I came up with in about two minutes.
I'll add some code to track the high water mark as well and report back.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-02 Thread skip

Terry Kristján V. Jónsson [EMAIL PROTECTED] wrote:
 Anyway, Skip noted that 50% of all floats are whole numbers between
 -10 and 10 inclusive,

Terry Please, no.  He said something like this about
Terry *non-floating-point applications* (evidence unspecified, that I
Terry remember).  But such applications, by definition, usually don't
Terry have enough floats for caching (or conversion time) to matter too
Terry much.

Correct.  The non-floating-point application I chose was the one that was
most immediately available, make test.  Note that I have no proof that
regrtest.py isn't terribly floating point intensive.  I just sort of guessed
that it was.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-02 Thread skip

skip Most definitely.  I just posted what I came up with in about two
skip minutes.  I'll add some code to track the high water mark as well
skip and report back.

Using the smallest change I could get away with, I came up with these
allocation figures (same as before):

-1.0: 29048
 0.0: 524340
+1.0: 91560
rest: 1753479
whole numbers -10.0 to 10.0: 1151543

and these max ref counts:

-1.0: 16
 0.0: 136
+1.0: 161
rest: 1
whole numbers -10.0 to 10.0: 161

When I have a couple more minutes I'll just implement a cache for whole
numbers between -10.0 and 10.0 and test that whole range of values right.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-01 Thread Nick Craig-Wood
On Fri, Sep 29, 2006 at 12:03:03PM -0700, Guido van Rossum wrote:
 I see some confusion in this thread.
 
 If a *LITERAL* 0.0 (or any other float literal) is used, you only get
 one object, no matter how many times it is used.

For some reason that doesn't happen in the interpreter which has been
confusing the issue slightly...

$ python2.5
Python 2.5c1 (r25c1:51305, Aug 19 2006, 18:23:29)
[GCC 4.1.2 20060814 (prerelease) (Debian 4.1.1-11)] on linux2
Type help, copyright, credits or license for more information.
 a=0.0
 b=0.0
 id(a), id(b)
(134737756, 134737772)


$ python2.5 -c 'a=0.0; b=0.0; print id(a),id(b)'
134737796 134737796

 But if the result of a *COMPUTATION* returns 0.0, you get a new object
 for each such result. If you have 70 MB worth of zeros, that's clearly
 computation results, not literals.

In my application I'm receiving all the zeros from a server over TCP
as ASCII and these are being float()ed in python.

-- 
Nick Craig-Wood [EMAIL PROTECTED] -- http://www.craig-wood.com/nick
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-01 Thread Nick Craig-Wood
On Sat, Sep 30, 2006 at 03:21:50PM -0700, Bob Ippolito wrote:
 On 9/30/06, Terry Reedy [EMAIL PROTECTED] wrote:
  Nick Coghlan [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]
   I suspect the problem would typically stem from floating point
   values that are read in from a human-readable file rather than
   being the result of a 'calculation' as such:

Over a TCP socket in ASCII format for my application

  For such situations, one could create a translation dict for both common
  float values and for non-numeric missing value indicators.  For instance,
  flotran = {'*': None, '1.0':1.0, '2.0':2.0, '4.0':4.0}
  The details, of course, depend on the specific case.
 
 But of course you have to know that common float values are never
 cached and that it may cause you problems. Some users may expect them
 to be because common strings and integers are cached.

I have to say I was surprised to find out how many copies of 0.0 there
were in my code and I guess I was subconsciously expecting the
immutable 0.0s to be cached even though I know consciously I've never
seen anything but int and str mentioned in the docs.

-- 
Nick Craig-Wood [EMAIL PROTECTED] -- http://www.craig-wood.com/nick
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-01 Thread Terry Reedy

Nick Craig-Wood [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 On Fri, Sep 29, 2006 at 12:03:03PM -0700, Guido van Rossum wrote:
 I see some confusion in this thread.

 If a *LITERAL* 0.0 (or any other float literal) is used, you only get
 one object, no matter how many times it is used.

 For some reason that doesn't happen in the interpreter which has been
 confusing the issue slightly...

 $ python2.5
 a=0.0
 b=0.0
 id(a), id(b)
 (134737756, 134737772)

Guido said *a* literal (emphasis shifted), reused as in a loop or function 
recalled, while you used *a* literal, then *another* literal, without 
reuse.  Try a=b=0.0 instead.

Terry Jan Reedy



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-10-01 Thread Jean-Paul Calderone
On Sun, 1 Oct 2006 13:54:31 -0400, Terry Reedy [EMAIL PROTECTED] wrote:

Nick Craig-Wood [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 On Fri, Sep 29, 2006 at 12:03:03PM -0700, Guido van Rossum wrote:
 I see some confusion in this thread.

 If a *LITERAL* 0.0 (or any other float literal) is used, you only get
 one object, no matter how many times it is used.

 For some reason that doesn't happen in the interpreter which has been
 confusing the issue slightly...

 $ python2.5
 a=0.0
 b=0.0
 id(a), id(b)
 (134737756, 134737772)

Guido said *a* literal (emphasis shifted), reused as in a loop or function
recalled, while you used *a* literal, then *another* literal, without
reuse.  Try a=b=0.0 instead.

Actually this just has to do with, um, compilation units, for lack of a
better term:

  [EMAIL PROTECTED]:~$ python
  Python 2.4.3 (#2, Apr 27 2006, 14:43:58) 
  [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
  Type help, copyright, credits or license for more information.
   a = 0.0
   b = 0.0
   print a is b
  False
   ^D
  [EMAIL PROTECTED]:~$ cat  test.py
  a = 0.0
  b = 0.0
  print a is b
  ^D
  [EMAIL PROTECTED]:~$ python test.py
  True
  [EMAIL PROTECTED]:~$ cat  test_a.py
  a = 0.0
  ^D
  [EMAIL PROTECTED]:~$ cat  test_b.py
  b = 0.0
  ^D
  [EMAIL PROTECTED]:~$ cat  test.py
  from test_a import a
  from test_b import b
  print a is b
  ^D
  [EMAIL PROTECTED]:~$ python test.py
  False
  [EMAIL PROTECTED]:~$ python
  Python 2.4.3 (#2, Apr 27 2006, 14:43:58) 
  [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
  Type help, copyright, credits or license for more information.
   a = 0.0; b = 0.0
   print a is b
  True
   
  [EMAIL PROTECTED]:~$

Each line in an interactive session is compiled separately, like modules
are compiled separately.  With the current implementation, literals in a
single compilation unit have a chance to be cached like this.  Literals
in different compilation units, even for the same value, don't.

Jean-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-09-30 Thread Steve Holden
Jason Orendorff wrote:
 On 9/29/06, Fredrik Lundh [EMAIL PROTECTED] wrote:
 
(I just checked the program I'm working on, and my analysis tells me
that the most common floating point value in that program is 121.216,
which occurs 32 times.  from what I can tell, 0.0 isn't used at all.)
 
 
 *bemused look*  Fredrik, can you share the reason why this number
 occurs 32 times in this program?  I don't mean to imply anything by
 that; it just sounds like it might be a fun story.  :)
 
 Anyway, this kind of static analysis is probably more entertaining
 than relevant.  For your enjoyment, the most-used float literals in
 python25\Lib, omitting test directories, are:
 
 1e-006: 5 hits
 4.0: 6 hits
 0.05: 7 hits
 6.0: 8 hits
 0.5: 13 hits
 2.0: 25 hits
 0.0: 36 hits
 1.0: 62 hits
 
 There are two hits each for -1.0 and -0.5.
 
 In my own Python code, I don't even have enough float literals to bother with.
 
By these statistics I think the answer to the original question is 
clearly no in the general case.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-09-30 Thread Martin v. Löwis
Bob Ippolito schrieb:
 My guess is that people do have this problem, they just don't know
 where that memory has gone. I know I don't count objects unless I have
 a process that's leaking memory or it grows so big that I notice (by
 swapping or chance).

Right. Although I do wonder what kind of software people write to run
into this problem. As Guido points out, the numbers must be the result
from some computation, or created by an extension module by different
means. If people have many *simultaneous* copies of 0.0, I would expect
there is something else really wrong with the data structures or
algorithms they use.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-09-30 Thread Nick Coghlan
Martin v. Löwis wrote:
 Bob Ippolito schrieb:
 My guess is that people do have this problem, they just don't know
 where that memory has gone. I know I don't count objects unless I have
 a process that's leaking memory or it grows so big that I notice (by
 swapping or chance).
 
 Right. Although I do wonder what kind of software people write to run
 into this problem. As Guido points out, the numbers must be the result
 from some computation, or created by an extension module by different
 means. If people have many *simultaneous* copies of 0.0, I would expect
 there is something else really wrong with the data structures or
 algorithms they use.

I suspect the problem would typically stem from floating point values that are 
read in from a human-readable file rather than being the result of a 
'calculation' as such:

  float('1') is float('1')
False
  float('0') is float('0')
False

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-09-30 Thread Kristján V . Jónsson
Well, a lot of extension code, like ours use PyFloat_FromDouble(foo);  This can 
be from vectors and stuff.  Very often these are values from a database.  
Integral float values are very common in such case and id didn't occur to me 
that they weren't being reused, at least for small values.

Also, a lot of arithmetic involving floats is expected to end in integers, like 
computing some index from a float value.  Integers get promoted to floats when 
touched by them, as you know.

Anyway, I now precreate integral values from -10 to 10 with great effect.  The 
cost is minimal, the benefit great.

Cheers,
Kristján

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin v. Löwis
Sent: 30. september 2006 08:48
To: Bob Ippolito
Cc: python-dev@python.org
Subject: Re: [Python-Dev] Caching float(0.0)

Bob Ippolito schrieb:
 My guess is that people do have this problem, they just don't know
 where that memory has gone. I know I don't count objects unless I have
 a process that's leaking memory or it grows so big that I notice (by
 swapping or chance).

Right. Although I do wonder what kind of software people write to run
into this problem. As Guido points out, the numbers must be the result
from some computation, or created by an extension module by different
means. If people have many *simultaneous* copies of 0.0, I would expect
there is something else really wrong with the data structures or
algorithms they use.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/kristjan%40ccpgames.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-09-30 Thread Terry Reedy

Nick Coghlan [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
I suspect the problem would typically stem from floating point values that 
are
read in from a human-readable file rather than being the result of a
'calculation' as such:

For such situations, one could create a translation dict for both common 
float values and for non-numeric missing value indicators.  For instance,
flotran = {'*': None, '1.0':1.0, '2.0':2.0, '4.0':4.0}
The details, of course, depend on the specific case.

tjr




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-09-30 Thread Bob Ippolito
On 9/30/06, Terry Reedy [EMAIL PROTECTED] wrote:

 Nick Coghlan [EMAIL PROTECTED] wrote in message
 news:[EMAIL PROTECTED]
 I suspect the problem would typically stem from floating point values that
 are
 read in from a human-readable file rather than being the result of a
 'calculation' as such:

 For such situations, one could create a translation dict for both common
 float values and for non-numeric missing value indicators.  For instance,
 flotran = {'*': None, '1.0':1.0, '2.0':2.0, '4.0':4.0}
 The details, of course, depend on the specific case.

But of course you have to know that common float values are never
cached and that it may cause you problems. Some users may expect them
to be because common strings and integers are cached.

-bob
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-09-30 Thread skip

Steve By these statistics I think the answer to the original question
Steve is clearly no in the general case.

As someone else (Guido?) pointed out, the literal case isn't all that
interesting.  I modified floatobject.c to track a few interesting
floating point values:

static unsigned int nfloats[5] = {
0, /* -1.0 */
0, /*  0.0 */
0, /* +1.0 */
0, /* everything else */
0, /* whole numbers from -10.0 ... 10.0 */
};

PyObject *
PyFloat_FromDouble(double fval)
{
register PyFloatObject *op;
if (free_list == NULL) {
if ((free_list = fill_free_list()) == NULL)
return NULL;
}

if (fval == 0.0) nfloats[1]++;
else if (fval == 1.0) nfloats[2]++;
else if (fval == -1.0) nfloats[0]++;
else nfloats[3]++;

if (fval = -10.0  fval = 10.0  (int)fval == fval) {
nfloats[4]++;
}

/* Inline PyObject_New */
op = free_list;
free_list = (PyFloatObject *)op-ob_type;
PyObject_INIT(op, PyFloat_Type);
op-ob_fval = fval;
return (PyObject *) op;
}

static void
_count_float_allocations(void)
{
fprintf(stderr, -1.0: %d\n, nfloats[0]);
fprintf(stderr,  0.0: %d\n, nfloats[1]);
fprintf(stderr, +1.0: %d\n, nfloats[2]);
fprintf(stderr, rest: %d\n, nfloats[3]);
fprintf(stderr, whole numbers -10.0 to 10.0: %d\n, nfloats[4]);
}

then called atexit(_count_float_allocations) in _PyFloat_Init and ran make
test.  The output was:

...
./python.exe -E -tt ../Lib/test/regrtest.py -l 
...
-1.0: 29048
 0.0: 524241
+1.0: 91561
rest: 1749807
whole numbers -10.0 to 10.0: 1151442

So for a largely non-floating point application, a fair number of floats
are allocated, a bit over 25% of them are -1.0, 0.0 or +1.0, and nearly 50%
of them are whole numbers between -10.0 and 10.0, inclusive.

Seems like it at least deserves a serious look.  It would be nice to have
the numeric crowd contribute to this subject as well.

Skip

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-09-29 Thread Fredrik Lundh
Nick Craig-Wood wrote:

 Is there any reason why float() shouldn't cache the value of 0.0 since
 it is by far and away the most common value?

says who ?

(I just checked the program I'm working on, and my analysis tells me 
that the most common floating point value in that program is 121.216, 
which occurs 32 times.  from what I can tell, 0.0 isn't used at all.)

/F

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-09-29 Thread Kristján V . Jónsson
Acting on this excellent advice, I have patched in a reuse for -1.0, 0.0 and 
1.0 for EVE Online.  We use vectors and stuff a lot, and 0.0 is very, very 
common.  I'll report on the refcount of this for you shortly.

K 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] 
 On Behalf Of Fredrik Lundh
 Sent: 29. september 2006 15:11
 To: python-dev@python.org
 Subject: Re: [Python-Dev] Caching float(0.0)
 
 Nick Craig-Wood wrote:
 
  Is there any reason why float() shouldn't cache the value 
 of 0.0 since 
  it is by far and away the most common value?
 
 says who ?
 
 (I just checked the program I'm working on, and my analysis 
 tells me that the most common floating point value in that 
 program is 121.216, which occurs 32 times.  from what I can 
 tell, 0.0 isn't used at all.)
 
 /F
 
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/kristjan%40c
cpgames.com
 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-09-29 Thread Kristján V . Jónsson
Well gentlemen, I did gather some stats on the frequency of 
PyFloat_FromDouble().
out of the 1000 first different floats allocated, we get this frequency 
distribution once our server has started up:

-   stats   [1000]({v=0.0 c=410612 
},{v=1. c=107838 },{v=0.75000 c=25487 
},{v=5. c=22557 },...) std::vectorentry,std::allocatorentry 
+   [0] {v=0.0 c=410612 }   entry
+   [1] {v=1. c=107838 }entry
+   [2] {v=0.75000 c=25487 }entry
+   [3] {v=5. c=22557 } entry
+   [4] {v=1. c=18530 } entry
+   [5] {v=-1. c=14950 }entry
+   [6] {v=2. c=14460 } entry
+   [7] {v=1500.0 c=13470 } entry
+   [8] {v=100.00 c=11913 } entry
+   [9] {v=0.5 c=11497 }entry
+   [10]{v=3. c=9833 }  entry
+   [11]{v=20.000 c=9019 }  entry
+   [12]{v=0.90002 c=8954 } entry
+   [13]{v=10.000 c=8377 }  entry
+   [14]{v=4. c=7890 }  entry
+   [15]{v=0.050003 c=7732 }entry
+   [16]{v=1000.0 c=7456 }  entry
+   [17]{v=0.40002 c=7427 } entry
+   [18]{v=-100.00 c=7071 } entry
+   [19]{v=5000.0 c=6851 }  entry
+   [20]{v=100.00 c=6503 }  entry
+   [21]{v=0.070007 c=6071 }entry 

(here I omit the rest).
In addition, my shared 0.0 double has some 20 references at this point.
0.0 is very, very common.  The same can be said about all the integers up to 
5.0 as well as -1.0
I think I will add a simple cache for these values for Eve.
something like:
int i = (int) fval;
if ((double)i == fval  i=-1  i6) {
Py_INCREF(table[i]);
return table[i];
}



Cheers,

Kristján
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] 
 On Behalf Of Kristján V. Jónsson
 Sent: 29. september 2006 15:18
 To: Fredrik Lundh; python-dev@python.org
 Subject: Re: [Python-Dev] Caching float(0.0)
 
 Acting on this excellent advice, I have patched in a reuse 
 for -1.0, 0.0 and 1.0 for EVE Online.  We use vectors and 
 stuff a lot, and 0.0 is very, very common.  I'll report on 
 the refcount of this for you shortly.
 
 K 
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED]
  On Behalf Of Fredrik Lundh
  Sent: 29. september 2006 15:11
  To: python-dev@python.org
  Subject: Re: [Python-Dev] Caching float(0.0)
  
  Nick Craig-Wood wrote:
  
   Is there any reason why float() shouldn't cache the value
  of 0.0 since
   it is by far and away the most common value?
  
  says who ?
  
  (I just checked the program I'm working on, and my analysis 
 tells me 
  that the most common floating point value in that program 
 is 121.216, 
  which occurs 32 times.  from what I can tell, 0.0 isn't 
 used at all.)
  
  /F
  
  ___
  Python-Dev mailing list
  Python-Dev@python.org
  http://mail.python.org/mailman/listinfo/python-dev
  Unsubscribe: 
  http://mail.python.org/mailman/options/python-dev/kristjan%40c
 cpgames.com
  
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/kristjan%40c
cpgames.com
 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-09-29 Thread Jason Orendorff
On 9/29/06, Fredrik Lundh [EMAIL PROTECTED] wrote:
 (I just checked the program I'm working on, and my analysis tells me
 that the most common floating point value in that program is 121.216,
 which occurs 32 times.  from what I can tell, 0.0 isn't used at all.)

*bemused look*  Fredrik, can you share the reason why this number
occurs 32 times in this program?  I don't mean to imply anything by
that; it just sounds like it might be a fun story.  :)

Anyway, this kind of static analysis is probably more entertaining
than relevant.  For your enjoyment, the most-used float literals in
python25\Lib, omitting test directories, are:

1e-006: 5 hits
4.0: 6 hits
0.05: 7 hits
6.0: 8 hits
0.5: 13 hits
2.0: 25 hits
0.0: 36 hits
1.0: 62 hits

There are two hits each for -1.0 and -0.5.

In my own Python code, I don't even have enough float literals to bother with.

-j
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-09-29 Thread Nick Maclaren
Jason Orendorff [EMAIL PROTECTED] wrote:

 Anyway, this kind of static analysis is probably more entertaining
 than relevant.  ...

Well, yes.  One can tell that by the piffling little counts being
bandied about!  More seriously, yes, it is Well Known that 0.0 is
the Most Common Floating-Point Number is most numerical codes; a
lot of older (and perhaps modern) sparse matrix algorithms use that
to save space.

In the software floating-point that I have started to draft some
example code but have had to shelve (no, I haven't forgotten) the
values I predefine are Invalid, Missing, True Zero and Approximate
Zero.  The infinities and infinitesimals (a.k.a. signed zeroes)
could also be included, but are less common and more complicated.
And so could common integers and fractions.

It is generally NOT worth doing a cache lookup for genuinely
numerical code, as the common cases that are not the above rarely
account for enough of the numbers to be worth it.  I did a fair
amount of investigation looking for compressibility at one time,
and that conclusion jumped out at me.

The exact best choice depends entirely on what you are doing.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  [EMAIL PROTECTED]
Tel.:  +44 1223 334761Fax:  +44 1223 334679
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-09-29 Thread Guido van Rossum
I see some confusion in this thread.

If a *LITERAL* 0.0 (or any other float literal) is used, you only get
one object, no matter how many times it is used.

But if the result of a *COMPUTATION* returns 0.0, you get a new object
for each such result. If you have 70 MB worth of zeros, that's clearly
computation results, not literals.

Attempts to remove literal references from source code won't help much.

I'm personally +0 on caching computational results with common float
values such as 0 and small (positive or negative) powers of two, e.g.
0.5, 1.0, 2.0.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-09-29 Thread Greg Ewing
Nick Craig-Wood wrote:

 Is there any reason why float() shouldn't cache the value of 0.0 since
 it is by far and away the most common value?

1.0 might be another candidate for cacheing.

Although the fact that nobody has complained about this
before suggests that it might not be a frequent enough
problem to be worth the effort.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Caching float(0.0)

2006-09-29 Thread Bob Ippolito
On 9/29/06, Greg Ewing [EMAIL PROTECTED] wrote:
 Nick Craig-Wood wrote:

  Is there any reason why float() shouldn't cache the value of 0.0 since
  it is by far and away the most common value?

 1.0 might be another candidate for cacheing.

 Although the fact that nobody has complained about this
 before suggests that it might not be a frequent enough
 problem to be worth the effort.

My guess is that people do have this problem, they just don't know
where that memory has gone. I know I don't count objects unless I have
a process that's leaking memory or it grows so big that I notice (by
swapping or chance).

That said, I've never noticed this particular issue.. but I deal with
mostly strings. I have had issues with the allocator a few times that
I had to work around, but not this sort of issue.

-bob
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com