Re: [Python-Dev] Caching float(0.0)
On 4 Oct 2006, at 06:34, Martin v. Löwis wrote: Alastair Houghton schrieb: On 3 Oct 2006, at 17:47, James Y Knight wrote: On Oct 3, 2006, at 8:30 AM, Martin v. Löwis wrote: As Michael Hudson observed, this is difficult to implement, though: You can't distinguish between -0.0 and +0.0 easily, yet you should. Of course you can. It's absolutely trivial. The only part that's even *the least bit* sketchy in this is assuming that a double is 64 bits. Practically speaking, that is true on all architectures I know of, How about doing 1.0 / x, where x is the number you want to test? This is a bad idea. It may cause a trap, leading to program termination. AFAIK few systems have floating point traps enabled by default (in fact, isn't that what IEEE 754 specifies?), because they often aren't very useful. And in the specific case of the Python interpreter, why would you ever want them turned on? Surely in order to get consistent floating point semantics, they need to be *off* and Python needs to handle any exceptional cases itself; even if they're on, by your argument Python must do that to avoid being terminated. (Not to mention the problem that floating point traps are typically delivered by a signal, the problems with which were discussed extensively in a recent thread on this list.) And it does have two advantages over the other methods proposed: 1. You don't have to write the value to memory; this test will work entirely in the machine's floating point registers. 2. It doesn't rely on the machine using IEEE floating point. (Of course, neither does the binary comparison method, but it still involves a trip to memory, and assumes that the machine doesn't have multiple representations for +0.0 or -0.0.) Even if you're saying that there's a significant chance of a trap (which I don't believe, not on common platforms anyway), the configure script could test to see if this will happen and fall back to one of the other approaches, or see if it can't turn them off using the C99 fenv.h APIs. (I think I'd agree with you that handling SIGFPE is undesirable, which is perhaps what you were driving at.) Anyway, it's only an idea, and I thought I'd point it out as nobody else had yet. If 0.0 is going to be cached, then I certainly think -0.0 and +0.0 should be two separate values if they exist on a given machine. I'm less concerned about exactly how that comes about. Kind regards, Alastair. -- http://alastairs-place.net ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On 4 Oct 2006, at 02:38, Josiah Carlson wrote: Alastair Houghton [EMAIL PROTECTED] wrote: There is, of course, the option of examining their representations in memory (I described the general technique in another posting on this thread). From what I understand of IEEE 764 FP doubles, -0.0 and +0.0 have different representations, and if we look at the underlying representation (perhaps by a *((uint64*)(float_input))), we can easily distinguish all values we want to cache... Yes, though a trip via memory isn't necessarily cheap, and you're also assuming that the machine doesn't use an FP representation with multiple +0s or -0s. Perhaps they should be different anyway though, I suppose. And as I stated before, we can switch on those values. Alternatively, if we can't switch on the 64 bit values directly... uint32* p = (uint32*)(double_input) if (!p[0]) { /* p[1] on big-endian platforms */ switch p[1] { /* p[0] on big-endian platforms */ ... } } That's worse, IMHO, because it assumes more about the representation. If you're going to look directly at the binary, I think all you can reasonably do is a straight binary comparison. I don't think you should poke at the bits without first knowing that the platform uses IEEE floating point. The reason I suggested 1.0/x is that it's one of the few ways (maybe the only way?) to distinguish -0.0 and +0.0 using arithmetic, which is what people that care about the difference between the two are going to care about. Kind regards, Alastair. -- http://alastairs-place.net ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Alastair Houghton [EMAIL PROTECTED] wrote: AFAIK few systems have floating point traps enabled by default (in fact, isn't that what IEEE 754 specifies?), because they often aren't very useful. The first two statements are true; the last isn't. They are extremely useful, not least because they are the only practical way to locate numeric errors in most 3 GL programs (including C, Fortran etc.) And in the specific case of the Python interpreter, why would you ever want them turned on? Surely in order to get consistent floating point semantics, they need to be *off* and Python needs to handle any exceptional cases itself; even if they're on, by your argument Python must do that to avoid being terminated. Grrk. Why are you assuming that turning them off means that the result is what you expect? That isn't always so - sometimes it merely means that you get wrong answers but no indication of that. or see if it can't turn them off using the C99 fenv.h APIs. That is a REALLY bad idea. You have no idea how broken that is, and what the impact it would be on Python. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761Fax: +44 1223 334679 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
James Y Knight [EMAIL PROTECTED] wrote: This is a really poor argument. Python should be moving *towards* proper '754 fp support, not away from it. On the platforms that are most important, the C implementations distinguish positive and negative 0. That the current python implementation may be defective when the underlying C implementation is defective doesn't excuse a change to intentionally break python on the common platforms. Perhaps you might like to think why only IBM POWERx (and NOT the Cell or most embedded POWERs) is the ONLY mainstream system to have implemented all of IEEE 754 in hardware after 22 years? Or why NO programming language has provided support in those 22 years, and only Java and C have even claimed to? See Kahan's How Javas Floating-Point Hurts Everyone Everywhere, note that C99 is much WORSE, and then note that Java and C99 are the only languages that have even attempted to include IEEE 754. You have also misunderstood the issue. The fact that a C implementation doesn't support it does NOT mean that the implementation is defective; quite the contrary. The issue always has been that IEEE 754's basic model is incompatible with the basic models of all programming languages that I am familiar with (which is a lot). And the specific problems with C99 are in the STANDARD, not the IMPLEMENTATIONS. IEEE 754 is so widely implemented that IMO it would make sense to make Python's floating point specify it, and simply declare floating point operations on non-IEEE 754 machines as use at own risk, may not conform to python language standard. (or if someone wants to use a software fp library for such machines, that's fine too). Firstly, see the above. Secondly, Python would need MAJOR semantic changes to conform to IEEE 754R. Thirdly, what would you say to the people who want reliable error detection on floating-point of the form that Python currently provides? Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761Fax: +44 1223 334679 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On Wed, Oct 04, 2006 at 12:42:04AM -0400, Tim Peters wrote: [EMAIL PROTECTED] If C90 doesn't distinguish -0.0 and +0.0, how can Python? With liberal applications of piss vinegar ;-) Can you give a simple example where the difference between the two is apparent to the Python programmer? Perhaps surprsingly, many (well, comparatively many, compared to none ) people have noticed that the platform atan2 cares a lot: from math import atan2 as a z = 0.0 # postive zero m = -z # minus zero a(z, z) # the result here is actually +0.0 0.0 a(z, m) 3.1415926535897931 a(m, z)# the result here is actually -0.0 0.0 This actually returns -0.0 under linux... a(m, m) -3.1415926535897931 It work like that even on Windows, and these are the results C99's 754-happy appendix mandates for atan2 applied to signed zeroes. I've even seen a /complaint/ on c.l.py that atan2 doesn't do the same when z = 0.0 is replaced by z = 0 That is, at least one person thought it was a bug that integer zeroes didn't deliver the same behaviors. Do people actually rely on this? I know I don't, but given that more than just 2 people have remarked on it seeming to like it, I expect that changing this would break /some/ code out there. Probably! It surely isn't a big problem though is it? instead of writing if (result == 0.0) returned cached_float_0; we just write something like if (memcmp((result, static_zero, sizeof(double)) == 0)) returned cached_float_0; Eg the below prints (gcc/linux) The memcmp() way 1: 0 == 0.0 2: -0 != 0.0 The == way 3: 0 == 0.0 4: -0 == 0.0 #include stdio.h #include string.h int main(void) { static double zero_value = 0.0; double result; printf(The memcmp() way\n); result = 0.0; if (memcmp(result, zero_value, sizeof(double)) == 0) printf(1: %g == 0.0\n, result); else printf(1: %g != 0.0\n, result); result = -0.0; if (memcmp(result, zero_value, sizeof(double)) == 0) printf(2: %g == 0.0\n, result); else printf(2: %g != 0.0\n, result); printf(The == way\n); result = 0.0; if (result == 0.0) printf(3: %g == 0.0\n, result); else printf(3: %g != 0.0\n, result); result = -0.0; if (result == 0.0) printf(4: %g == 0.0\n, result); else printf(4: %g != 0.0\n, result); return 0; } -- Nick Craig-Wood [EMAIL PROTECTED] -- http://www.craig-wood.com/nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Hm, doesn´t seem to be so for my regular python. Python 2.3.3 Stackless 3.0 040407 (#51, Apr 7 2004, 19:28:46) [MSC v.1200 32 bi t (Intel)] on win32 Type help, copyright, credits or license for more information. x = -0.0 y = 0.0 x,y (0.0, 0.0) maybe it is 2.3.3, or maybe it is stackless from back then. K -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin v. Löwis Sent: 3. október 2006 17:56 To: [EMAIL PROTECTED] Cc: Nick Maclaren; python-dev@python.org Subject: Re: [Python-Dev] Caching float(0.0) [EMAIL PROTECTED] schrieb: If C90 doesn't distinguish -0.0 and +0.0, how can Python? Can you give a simple example where the difference between the two is apparent to the Python programmer? Sure: py x=-0.0 py y=0.0 py x,y ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On Wed, Oct 04, 2006 at 12:42:04AM -0400, Tim Peters wrote: If C90 doesn't distinguish -0.0 and +0.0, how can Python? Can you give a simple example where the difference between the two is apparent to the Python programmer? Perhaps surprsingly, many (well, comparatively many, compared to none ) people have noticed that the platform atan2 cares a lot: Once upon a time, floating-point was used as an approximation to mathematical real numbers, and anything which was mathematically undefined in real arithmetic was regarded as an error in floating- point. This allowed a reasonable amount of numeric validation, because the main remaining discrepancy was that floating-point has only limited precision and range. Most of the numerical experts that I know of still favour that approach, and it is the one standardised by the ISO LIA-1, LIA-2 and LIA-3 standards for floating-point arithmetic. atan2(0.0,0.0) should be an error. But C99 differs. While words do not fail me, they are inappropriate for this mailing list :-( Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761Fax: +44 1223 334679 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Alastair Houghton schrieb: AFAIK few systems have floating point traps enabled by default (in fact, isn't that what IEEE 754 specifies?), because they often aren't very useful. And in the specific case of the Python interpreter, why would you ever want them turned on? That reasoning is irrelevant. If it breaks a few systems, that already is some systems too many. Python should never crash; and we have no control over the floating point exception handling in any portable manner. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Kristján V. Jónsson schrieb: Hm, doesn´t seem to be so for my regular python. maybe it is 2.3.3, or maybe it is stackless from back then. It's because you are using Windows. The way -0.0 gets rendered depends on the platform. As Tim points out, try math.atan2(0.0, -0.0) vs math.atan2(0.0, 0.0). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On Oct 4, 2006, at 8:14 PM, Martin v. Löwis wrote: If it breaks a few systems, that already is some systems too many. Python should never crash; and we have no control over the floating point exception handling in any portable manner. You're quite right, though there is already plenty of platform dependent code in Python for just that purpose (see fpectlmodule.c, for instance). Anyway, all I originally wanted was to point out that using division was one possible way to tell the difference that didn't involve relying on the representation being IEEE compliant. It's true that there are problems with FP exceptions. Kind regards, Alastair. -- http://alastairs-place.net ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On Tue, Oct 03, 2006 at 09:47:03AM +1000, Delaney, Timothy (Tim) wrote: This doesn't actually give us a very useful indication of potential memory savings. What I think would be more useful is tracking the maximum simultaneous count of each value i.e. what the maximum refcount would have been if they were shared. It isn't just memory savings we are playing for. Even if 0.0 is allocated and de-allocated 10,000 times in a row, there would be no memory savings by caching its value. However there would be a) less allocator overhead - allocation objects is relatively expensive b) better caching of the value c) less cache thrashing I think you'll find that even in the no memory saving case a few cycles spent on comparison with 0.0 (or maybe a few other values) will speed up programs. -- Nick Craig-Wood [EMAIL PROTECTED] -- http://www.craig-wood.com/nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On Mon, Oct 02, 2006 at 07:53:34PM -0500, [EMAIL PROTECTED] wrote: Terry Kristján V. Jónsson [EMAIL PROTECTED] wrote: Anyway, Skip noted that 50% of all floats are whole numbers between -10 and 10 inclusive, Terry Please, no. He said something like this about Terry *non-floating-point applications* (evidence unspecified, that I Terry remember). But such applications, by definition, usually don't Terry have enough floats for caching (or conversion time) to matter too Terry much. Correct. The non-floating-point application I chose was the one that was most immediately available, make test. Note that I have no proof that regrtest.py isn't terribly floating point intensive. I just sort of guessed that it was. For my application caching 0.0 is by far the most important. 0.0 has ~200,000 references - the next highest reference count is only about ~200. -- Nick Craig-Wood [EMAIL PROTECTED] -- http://www.craig-wood.com/nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Terry Reedy wrote: For true floating point measurements (of temperature, for instance), 'integral' measurements (which are an artifact of the scale used (degrees F versus C versus K)) should generally be no more common than other realized measurements. a real-life sensor is of course where the 121.216 in my original post to this thread came from. (note that most real-life sensors involve A/D conversion at some point, which means that they provide a limited number of discrete values. but only the code dealing with the source data will be able to make any meaningful assumptions about those values.) I still think it might make sense to special-case float(0.0) (padding, default values, etc) inside PyFloat_FromDouble, and possibly also float(1.0) (scale factors, unit vectors, normalized max values, etc) but everything else is just generalizing from random observations. adding a few notes to the C API documentation won't hurt either, I suppose. (e.g. note that each call to PyFloat_FromDouble may create a new floating point object; if you're converting data from some internal format to Python floats, it's often more efficient to map directly to preallocated shared PyFloat objects, instead of mapping first to float or double and then calling PyFloat_FromDouble on that value). /F ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Terry Reedy [EMAIL PROTECTED] wrote: For true floating point measurements (of temperature, for instance), 'integral' measurements (which are an artifact of the scale used (degrees F versus C versus K)) should generally be no more common than other realized measurements. Not quite, but close enough. A lot of algorithms use a conversion to integer, or some of the values are actually counts (e.g. in statistics), which makes them a bit more likely. Not enough to get excited about, in general. Thirty years ago, a major stat package written in Fortran (BMDP) required that all data be stored as (Fortran 4-byte) floats for analysis. So a column of yes/no or male/female data would be stored as 0.0/1.0 or perhaps 1.0/2.0. That skewed the distribution of floats. But Python and, I hope, Python apps, are more modern than that. And SPSS and Genstat and others - now even Excel Float caching strikes me a a good subject for cookbook recipies, but not, without real data and a willingness to slightly screw some users, for the default core code. Yes. It is trivial (if tedious) to add analysis code - the problem is finding suitable representative applications. That was always my difficulty when I was analysing this sort of thing - and still is when I need to do it! Nick Craig-Wood [EMAIL PROTECTED] wrote: For my application caching 0.0 is by far the most important. 0.0 has ~200,000 references - the next highest reference count is only about ~200. Yes. All the experience I have ever seen over the past 4 decades confirms that is the normal case, with the exception of floating-point representations that have a missing value indicator. Even in IEEE 754, infinities and NaN are rare unless the application is up the spout. There are claims that a lot of important ones have a lot of NaNs and use them as missing values but, despite repeated requests, none of the people claiming that have ever provided an example. There are some pretty solid grounds for believing that those claims are not based in fact, but are polemic. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761Fax: +44 1223 334679 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
But that is precisely the point. A non-floating point application tends to use floating point values in a predictable way, with a lot of integral values floating around and lots of zeroes. As this constitutes the majority of python applications (okay, daring assumption here) it seems to warrant some consideration. In one of my first messages on the subject I promised to report refcounts of -1.0, 0.0 and 1.0 for the EVE server as being. I didn't but instead gave you the frequency of the values reported. Well , now I can provide you with refcounts for the [-10, 10] range plus the total float count, of a server that has just started up: -10,0 589 -9,056 -8,065 -7,063 -6,0243 -5,0731 -4,0550 -3,0246 -2,0246 -1,01096 0,0 195446 1,0 79382 2,0 9650 3,0 6224 4,0 5223 5,0 14766 6,0 2616 7,0 1303 8,0 3307 9,0 1447 10,08102 total: 331351 The total count of floating point numbers allocated at this point is 985794. Without the reuse, they would be 1317145, so this is a saving of 25%, and of 5Mb. Kristján -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: 3. október 2006 00:54 To: Terry Reedy Cc: python-dev@python.org Subject: Re: [Python-Dev] Caching float(0.0) Terry Kristján V. Jónsson [EMAIL PROTECTED] wrote: Anyway, Skip noted that 50% of all floats are whole numbers between -10 and 10 inclusive, Terry Please, no. He said something like this about Terry *non-floating-point applications* (evidence unspecified, that I Terry remember). But such applications, by definition, usually don't Terry have enough floats for caching (or conversion time) to matter too Terry much. Correct. The non-floating-point application I chose was the one that was most immediately available, make test. Note that I have no proof that regrtest.py isn't terribly floating point intensive. I just sort of guessed that it was. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
=?iso-8859-1?Q?Kristj=E1n_V=2E_J=F3nsson?= [EMAIL PROTECTED] wrote: The total count of floating point numbers allocated at this point is 985794. Without the reuse, they would be 1317145, so this is a saving of 25%, and of 5Mb. And, if you optimised just 0.0, you would get 60% of that saving at a small fraction of the cost and considerably greater generality. It isn't clear whether the effort justifies doing more. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761Fax: +44 1223 334679 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
The total count of floating point numbers allocated at this point is 985794. Without the reuse, they would be 1317145, so this is a saving of 25%, and of 5Mb. Nick And, if you optimised just 0.0, you would get 60% of that saving Nick at a small fraction of the cost and considerably greater Nick generality. It isn't clear whether the effort justifies doing Nick more. Doesn't that presume that optimizing just 0.0 could be done easily? Suppose 0.0 is generated all over the place in EVE? Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Nick Craig-Wood schrieb: Even if 0.0 is allocated and de-allocated 10,000 times in a row, there would be no memory savings by caching its value. However there would be a) less allocator overhead - allocation objects is relatively expensive b) better caching of the value c) less cache thrashing I think you'll find that even in the no memory saving case a few cycles spent on comparison with 0.0 (or maybe a few other values) will speed up programs. Can you demonstrate that speedup? It is quite difficult to anticipate the performance impact of a change, in particular if there is no change in computational complexity. Various effects tend to balance out each other. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Nick Maclaren schrieb: The total count of floating point numbers allocated at this point is 985794. Without the reuse, they would be 1317145, so this is a saving of 25%, and of 5Mb. And, if you optimised just 0.0, you would get 60% of that saving at a small fraction of the cost and considerably greater generality. As Michael Hudson observed, this is difficult to implement, though: You can't distinguish between -0.0 and +0.0 easily, yet you should. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= [EMAIL PROTECTED] wrote: The total count of floating point numbers allocated at this point is 985794. Without the reuse, they would be 1317145, so this is a saving of 25%, and of 5Mb. And, if you optimised just 0.0, you would get 60% of that saving at a small fraction of the cost and considerably greater generality. As Michael Hudson observed, this is difficult to implement, though: You can't distinguish between -0.0 and +0.0 easily, yet you should. That was the point of a previous posting of mine in this thread :-( You shouldn't, despite what IEEE 754 says, at least if you are allowing for either portability or numeric validation. There are a huge number of good reasons why IEEE 754 signed zeroes fit extremely badly into any normal programming language and are seriously incompatible with numeric validation, but Python adds more. Is there any other type where there are two values that are required to be different, but where both the hash is required to be zero and both are required to evaluate to False in truth value context? Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761Fax: +44 1223 334679 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Nick Maclaren schrieb: That was the point of a previous posting of mine in this thread :-( You shouldn't, despite what IEEE 754 says, at least if you are allowing for either portability or numeric validation. There are a huge number of good reasons why IEEE 754 signed zeroes fit extremely badly into any normal programming language and are seriously incompatible with numeric validation, but Python adds more. Is there any other type where there are two values that are required to be different, but where both the hash is required to be zero and both are required to evaluate to False in truth value context? Ah, you are proposing a semantic change, then: -0.0 will become unrepresentable, right? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= [EMAIL PROTECTED] wrote: Ah, you are proposing a semantic change, then: -0.0 will become unrepresentable, right? Well, it is and it isn't. Python currently supports only some of IEEE 754, and that is more by accident than design - because that is exactly what C90 implementations do! There is code in floatobject.c that assumes IEEE 754, but Python does NOT attempt to support it in toto (it is not clear if it could), not least because it uses C90. And, as far as I know, none of that is in the specification, because Python is at least in theory portable to systems that use other arithmetics and there is no current way to distinguish -0.0 from 0.0 except by comparing their representations! And even THAT depends entirely on whether the C library distinguishes the cases, as far as I can see. So distinguishing -0.0 from 0.0 isn't really in Python's current semantics at all. And, for reasons that we could go into, I assert that it should not be - which is NOT the same as not supporting branch cuts in cmath. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761Fax: +44 1223 334679 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Nick Maclaren schrieb: So distinguishing -0.0 from 0.0 isn't really in Python's current semantics at all. And, for reasons that we could go into, I assert that it should not be - which is NOT the same as not supporting branch cuts in cmath. Are you talking about Python the language specification or Python the implementation here? It is not a change to the language specification, as this aspect of the behavior (as you point out) is unspecified. However, it is certainly a change to the observable behavior of the Python implementation, and no amount of arguing can change that. Regards, Martin P.S. For that matter, *any* kind of changes to the singleton nature of certain immutable values is a change in semantics. It's just that dropping -0.0 is an *additional* change (on top of the change that 1.0-1.0 is 0.0 would change from False to True). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On 3 Oct 2006, at 15:10, Martin v. Löwis wrote: Nick Maclaren schrieb: That was the point of a previous posting of mine in this thread :-( You shouldn't, despite what IEEE 754 says, at least if you are allowing for either portability or numeric validation. There are a huge number of good reasons why IEEE 754 signed zeroes fit extremely badly into any normal programming language and are seriously incompatible with numeric validation, but Python adds more. Is there any other type where there are two values that are required to be different, but where both the hash is required to be zero and both are required to evaluate to False in truth value context? Ah, you are proposing a semantic change, then: -0.0 will become unrepresentable, right? It's only a semantic change on platforms that happen to use IEEE 754 float representations, or some other representation that exposes the sign of zero. The Python docs have for many years stated with regard to the float type: All bets on their precision are off unless you happen to know the machine you are working with. and that You are at the mercy of the underlying machine architecture Not all floating point representations support sign of zero, though in the modern world it's true that the vast majority do. It would be instructive to understand how much, if any, python code would break if we lost -0.0. I'm do not believe that there is any reliable way for python code to tell the difference between all of the different types of IEEE 754 zeros and in the special case of -0.0 the best test I can come up with is repr(n)[0]=='-'. Is there an compelling case, to do with compatibility or otherwise, for exposing the sign of a zero? It seems like a numerical anomaly to me. Nicko ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On Oct 3, 2006, at 8:30 AM, Martin v. Löwis wrote: As Michael Hudson observed, this is difficult to implement, though: You can't distinguish between -0.0 and +0.0 easily, yet you should. Of course you can. It's absolutely trivial. The only part that's even *the least bit* sketchy in this is assuming that a double is 64 bits. Practically speaking, that is true on all architectures I know of, and if it's not guaranteed, it could easily be a 'configure' time check. typedef union { double d; uint64_t i; } rawdouble; int isposzero(double a) { rawdouble zero; zero.d = 0.0; rawdouble aa; aa.d = a; return aa.i == zero.i; } int main() { if (sizeof(double) != sizeof(uint64_t)) return 1; printf(%d\n, isposzero(0.0)); printf(%d\n, isposzero(-0.0)); } James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Nicko van Someren schrieb: It's only a semantic change on platforms that happen to use IEEE 754 float representations, or some other representation that exposes the sign of zero. Right. Later, you admit that this is vast majority of modern machines. It would be instructive to understand how much, if any, python code would break if we lost -0.0. I'm do not believe that there is any reliable way for python code to tell the difference between all of the different types of IEEE 754 zeros and in the special case of -0.0 the best test I can come up with is repr(n)[0]=='-'. Is there an compelling case, to do with compatibility or otherwise, for exposing the sign of a zero? It seems like a numerical anomaly to me. I think it is reasonable to admit that a) this change is a change in semantics for the majority of the machines b) it is likely that this change won't affect a significant number of applications (I'm pretty sure someone will notice, though; someone always notices). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Martin However, it is certainly a change to the observable behavior of Martin the Python implementation, and no amount of arguing can change Martin that. If C90 doesn't distinguish -0.0 and +0.0, how can Python? Can you give a simple example where the difference between the two is apparent to the Python programmer? Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Martin b) it is likely that this change won't affect a significant Martinnumber of applications (I'm pretty sure someone will notice, Martinthough; someone always notices). +1 QOTF. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
James Y Knight wrote: On Oct 3, 2006, at 8:30 AM, Martin v. Löwis wrote: As Michael Hudson observed, this is difficult to implement, though: You can't distinguish between -0.0 and +0.0 easily, yet you should. Of course you can. It's absolutely trivial. The only part that's even *the least bit* sketchy in this is assuming that a double is 64 bits. Practically speaking, that is true on all architectures I know of, and if it's not guaranteed, it could easily be a 'configure' time check. typedef union { double d; uint64_t i; } rawdouble; int isposzero(double a) { rawdouble zero; zero.d = 0.0; rawdouble aa; aa.d = a; return aa.i == zero.i; } int main() { if (sizeof(double) != sizeof(uint64_t)) return 1; printf(%d\n, isposzero(0.0)); printf(%d\n, isposzero(-0.0)); } And you should be able to cache the single positive zero with something vaguely like: PyObject * PyFloat_FromDouble(double fval) { ... if (fval == 0.0 raw_match(fval, cached)) { PY_INCREF(cached); return cached; } ... -- -- Scott David Daniels [EMAIL PROTECTED] ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
[EMAIL PROTECTED] schrieb: If C90 doesn't distinguish -0.0 and +0.0, how can Python? Can you give a simple example where the difference between the two is apparent to the Python programmer? Sure: py x=-0.0 py y=0.0 py x,y (-0.0, 0.0) py hash(x),hash(y) (0, 0) py x==y True py str(x)==str(y) False py str(x),str(y) ('-0.0', '0.0') py float(str(x)),float(str(y)) (-0.0, 0.0) Imagine an application that reads floats from a text file, manipulates some of them, and then writes back the complete list of floats. Further assume that somehow, -0.0 got into the file. Currently, the sign round-trips; under the proposed change, it would stop doing so. Of course, there likely wouldn't be any real change to value, as the sign of 0 is likely of no significance to the application. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= [EMAIL PROTECTED] wrote: py x=-0.0 py y=0.0 py x,y Nobody is denying that SOME C90 implementations distinguish them, but it is no part of the standard - indeed, a C90 implementation is permitted to use ANY criterion for deciding when to display -0.0 and 0.0. C99 is ambiguous to the point of internal inconsistency, except when __STDC_IEC_559__ is set to 1, though the intent is clear. And my reading of Python's code is that it relies on C's handling of such values. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761Fax: +44 1223 334679 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On Oct 3, 2006, at 2:26 PM, Nick Maclaren wrote: =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= [EMAIL PROTECTED] wrote: py x=-0.0 py y=0.0 py x,y Nobody is denying that SOME C90 implementations distinguish them, but it is no part of the standard - indeed, a C90 implementation is permitted to use ANY criterion for deciding when to display -0.0 and 0.0. C99 is ambiguous to the point of internal inconsistency, except when __STDC_IEC_559__ is set to 1, though the intent is clear. And my reading of Python's code is that it relies on C's handling of such values. This is a really poor argument. Python should be moving *towards* proper '754 fp support, not away from it. On the platforms that are most important, the C implementations distinguish positive and negative 0. That the current python implementation may be defective when the underlying C implementation is defective doesn't excuse a change to intentionally break python on the common platforms. IEEE 754 is so widely implemented that IMO it would make sense to make Python's floating point specify it, and simply declare floating point operations on non-IEEE 754 machines as use at own risk, may not conform to python language standard. (or if someone wants to use a software fp library for such machines, that's fine too). James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Nick Maclaren schrieb: py x=-0.0 py y=0.0 py x,y Nobody is denying that SOME C90 implementations distinguish them, but it is no part of the standard - indeed, a C90 implementation is permitted to use ANY criterion for deciding when to display -0.0 and 0.0. C99 is ambiguous to the point of internal inconsistency, except when __STDC_IEC_559__ is set to 1, though the intent is clear. And my reading of Python's code is that it relies on C's handling of such values. So what is your conclusion? That applications will not break? People don't care that their code may break on a different platform, if they aren't using these platforms. They care if it breaks on their platform just because they use a new Python version. (Of course, they sometimes also complain that Python behaves differently on different platforms, and cannot really accept the explanation that the language didn't guarantee the same behavior on all systems. This explanation doesn't help them: they still need to modify the application). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
It would be instructive to understand how much, if any, python code would break if we lost -0.0. I'm do not believe that there is any reliable way for python code to tell the difference between all of the different types of IEEE 754 zeros and in the special case of -0.0 the best test I can come up with is repr(n)[0]=='-'. Is there an compelling case, to do with compatibility or otherwise, for exposing the sign of a zero? It seems like a numerical anomaly to me. I think it is reasonable to admit that a) this change is a change in semantics for the majority of the machines b) it is likely that this change won't affect a significant number of applications (I'm pretty sure someone will notice, though; someone always notices). If you're really going to bother doing this rather than just adding a note in the docs about testing for and reusing the most common float values to save memory when instantiating them from external input: Just do a binary comparison of the float with predefined + and - 0.0 float values or any other special values that you wish to catch rather than a floating point comparison. -g ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On 3 Oct 2006, at 17:47, James Y Knight wrote: On Oct 3, 2006, at 8:30 AM, Martin v. Löwis wrote: As Michael Hudson observed, this is difficult to implement, though: You can't distinguish between -0.0 and +0.0 easily, yet you should. Of course you can. It's absolutely trivial. The only part that's even *the least bit* sketchy in this is assuming that a double is 64 bits. Practically speaking, that is true on all architectures I know of, How about doing 1.0 / x, where x is the number you want to test? On systems with sane semantics, it should result in an infinity, the sign of which should depend on the sign of the zero. While I'm sure there are any number of places where it will break, on those platforms it seems to me that you're unlikely to care about the difference between +0.0 and -0.0 anyway, since it's hard to otherwise distinguish them. e.g. double value_to_test; ... if (value_to_test == 0.0) { double my_inf = 1.0 / value_to_test; if (my_inf 0.0) { /* We have a -ve zero */ } else if (my_inf 0.0) { /* We have a +ve zero */ } else { /* This platform might not support infinities (though we might get a signal or something rather than getting here in that case...) */ } } (I should add that presently I've only tried it on a PowerPC, because it's late and that's what's in front of me. It seems to work OK here.) Kind regards, Alastair -- http://alastairs-place.net ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Alastair Houghton [EMAIL PROTECTED] wrote: On 3 Oct 2006, at 17:47, James Y Knight wrote: On Oct 3, 2006, at 8:30 AM, Martin v. Löwis wrote: As Michael Hudson observed, this is difficult to implement, though: You can't distinguish between -0.0 and +0.0 easily, yet you should. Of course you can. It's absolutely trivial. The only part that's even *the least bit* sketchy in this is assuming that a double is 64 bits. Practically speaking, that is true on all architectures I know of, How about doing 1.0 / x, where x is the number you want to test? On systems with sane semantics, it should result in an infinity, the sign of which should depend on the sign of the zero. While I'm sure there are any number of places where it will break, on those platforms it seems to me that you're unlikely to care about the difference between +0.0 and -0.0 anyway, since it's hard to otherwise distinguish them. There is, of course, the option of examining their representations in memory (I described the general technique in another posting on this thread). From what I understand of IEEE 764 FP doubles, -0.0 and +0.0 have different representations, and if we look at the underlying representation (perhaps by a *((uint64*)(float_input))), we can easily distinguish all values we want to cache... We can observe it directly, for example on x86: import struct struct.pack('d', -0.0) '\x00\x00\x00\x00\x00\x00\x00\x80' struct.pack('d', 0.0) '\x00\x00\x00\x00\x00\x00\x00\x00' And as I stated before, we can switch on those values. Alternatively, if we can't switch on the 64 bit values directly... uint32* p = (uint32*)(double_input) if (!p[0]) { /* p[1] on big-endian platforms */ switch p[1] { /* p[0] on big-endian platforms */ ... } } - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Josiah Carlson wrote: [yet more on this topic] If the brainpower already expended on this issue were proportional to its significance then we'd be reading about it on CNN news. This thread has disappeared down a rat-hole, never to re-emerge with anything of significant benefit to users. C'mon, guys, implement a patch or leave it alone :-) regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On 10/3/06, Steve Holden [EMAIL PROTECTED] wrote: If the brainpower already expended on this issue were proportional to its significance then we'd be reading about it on CNN news. This thread has disappeared down a rat-hole, never to re-emerge with anything of significant benefit to users. C'mon, guys, implement a patch or leave it alone :-) Hear, hear. My proposal: only cache positive 0.0. My prediction: biggest bang for the buck, nobody's code will break. On platforms that don't distinguish between +/- 0.0, of course this would cache all zeros. On platforms that do distinguish them, -0.0 is left alone, which is just fine. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
[EMAIL PROTECTED] If C90 doesn't distinguish -0.0 and +0.0, how can Python? With liberal applications of piss vinegar ;-) Can you give a simple example where the difference between the two is apparent to the Python programmer? Perhaps surprsingly, many (well, comparatively many, compared to none ) people have noticed that the platform atan2 cares a lot: from math import atan2 as a z = 0.0 # postive zero m = -z # minus zero a(z, z) # the result here is actually +0.0 0.0 a(z, m) 3.1415926535897931 a(m, z)# the result here is actually -0.0 0.0 a(m, m) -3.1415926535897931 It work like that even on Windows, and these are the results C99's 754-happy appendix mandates for atan2 applied to signed zeroes. I've even seen a /complaint/ on c.l.py that atan2 doesn't do the same when z = 0.0 is replaced by z = 0 That is, at least one person thought it was a bug that integer zeroes didn't deliver the same behaviors. Do people actually rely on this? I know I don't, but given that more than just 2 people have remarked on it seeming to like it, I expect that changing this would break /some/ code out there. BTW, on /some/ platforms all those examples trigger EDOM from the platform libm instead -- which is also fine by C99, for implementations ignoring C99's optional 754-happy appendix. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
[EMAIL PROTECTED] Can you give a simple example where the difference between the two is apparent to the Python programmer? BTW, I don't recall the details and don't care enough to reconstruct them, but when Python's front end was first changed to recognize negative literals, it treated +0.0 and -0.0 the same, and we did get bug reports as a result. A bit more detail, because it's necessary to understand that even minimally. Python's grammar doesn't have negative numeric literals; e.g., according to the grammar, -1 and -1.1 are applications of the unary minus operator to the positive numeric literals 1 and 1.1. And for years Python generated code accordingly: LOAD_CONST followed by the unary minus opcode. Someone (Fred, I think) introduced a front-end optimization to collapse that to plain LOAD_CONST, doing the negation at compile time. The code object contains a vector of compile-time constants, and the optimized code initially didn't distinguish between +0.0 and -0.0. As a result, if the first float 0.0 in a code block looked postive, /all/ float zeroes in the code block were in effect treated as positive; and similarly if the first float zero was -0.0, all float zeroes were in effect treated as negative. That did break code. IIRC, it was fixed by special-casing the snot out of -0.0, leaving that single case as a LOAD_CONST followed by UNARY_NEGATIVE. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On Wednesday 04 October 2006 00:53, Tim Peters wrote: Someone (Fred, I think) introduced a front-end optimization to collapse that to plain LOAD_CONST, doing the negation at compile time. I did the original change to make negative integers use just LOAD_CONST, but I don't think I changed what was generated for float literals. That could be my memory going bad, though. The code changed several times as people with more numeric-fu that myself fixed all sorts of border cases. I've tried really hard to stay away from the code generator since then. :-) -Fred -- Fred L. Drake, Jr. fdrake at acm.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
[Tim] Someone (Fred, I think) introduced a front-end optimization to collapse that to plain LOAD_CONST, doing the negation at compile time. I did the original change to make negative integers use just LOAD_CONST, but I don't think I changed what was generated for float literals. That could be my memory going bad, though. It is ;-) Here under Python 2.2.3: from dis import dis def f(): return 0.0 + -0.0 + 1.0 + -1.0 ... dis(f) 0 SET_LINENO 1 3 SET_LINENO 1 6 LOAD_CONST 1 (0.0) 9 LOAD_CONST 1 (0.0) 12 UNARY_NEGATIVE 13 BINARY_ADD 14 LOAD_CONST 2 (1.0) 17 BINARY_ADD 18 LOAD_CONST 3 (-1.0) 21 BINARY_ADD 22 RETURN_VALUE 23 LOAD_CONST 0 (None) 26 RETURN_VALUE Note there that 0.0, 1.0, and -1.0 were all treated as literals, but that -0.0 still triggered a UNARY_NEGATIVE opcode. That was after the fix. You don't remember this as well as I do since I probably had to fix it, /and/ I ate enormous quantities of chopped, pressed, smoked, preservative-laden bag o' ham at the time. You really need to do both to remember floating-point trivia. Indeed, since I gave up my bag o' ham habit, I hardly ever jump into threads about fp trivia anymore. Mostly it's because I'm too weak from not eating anything, though -- how about lunch tomorrow? The code changed several times as people with more numeric-fu that myself fixed all sorts of border cases. I've tried really hard to stay away from the code generator since then. :-) Successfully, too! It's admirable. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Steve Holden [EMAIL PROTECTED] wrote: Josiah Carlson wrote: [yet more on this topic] If the brainpower already expended on this issue were proportional to its significance then we'd be reading about it on CNN news. Goodness, I wasn't aware that pointer manipulation took that much brainpower. I presume you mean what others have spent time thinking about with regards to this topic. This thread has disappeared down a rat-hole, never to re-emerge with anything of significant benefit to users. C'mon, guys, implement a patch or leave it alone :-) Heh. So be it. The following is untested (I lack a build system for the Python trunk). It adds a new global cache for floats, a new 'fill the global cache' function, and an updated PyFloat_FromDouble() function. All in all, it took about 10 minutes to generate, and understands the difference between fp +0.0 and -0.0 (assuming sane IEEE 754 fp double behavior on non-x86 platforms). - Josiah /* This should go into floatobject.c */ static PyFloatObject *cached_list = NULL; static PyFloatObject * fill_cached_list(void) { cached_list = (PyFloatObject *) 1; PyFloatObject *p; int i; p = (PyFloatObject *) PyMem_MALLOC(sizeof(PyFloatObject)*22); if (p == NULL) { cached_list = NULL; return (PyFloatObject *) PyErr_NoMemory(); } for (i=0;i=10;i++) { p[i] = (PyFloatObject*) PyFloat_fromDouble((double) i); p[21-i] = (PyFloatObject*) PyFloat_fromDouble(-((double) i)); } cached_list = NULL; return p; } PyObject * PyFloat_FromDouble(double fval) { register PyFloatObject *op; register long* fvali = (int*)(fval); if (free_list == NULL) { if ((free_list = fill_free_list()) == NULL) return NULL; } #ifdef LITTLE_ENDIAN if (!p[0]) #else if (!p[1]) #endif { if (cached_list == NULL) { if ((cached_list = fill_cached_list()) == NULL) return NULL; } if ((cached_list != 1) (cached_list != NULL)) { #ifdef LITTLE_ENDIAN switch p[1] #else switch p[0] #endif { case 0: PY_INCREF(cached_list[0]); return cached_list[0]; case 1072693248: PY_INCREF(cached_list[1]); return cached_list[1]; case 1073741824: PY_INCREF(cached_list[2]); return cached_list[2]; case 1074266112: PY_INCREF(cached_list[3]); return cached_list[3]; case 1074790400: PY_INCREF(cached_list[4]); return cached_list[4]; case 1075052544: PY_INCREF(cached_list[5]); return cached_list[5]; case 1075314688: PY_INCREF(cached_list[6]); return cached_list[6]; case 1075576832: PY_INCREF(cached_list[7]); return cached_list[7]; case 1075838976: PY_INCREF(cached_list[8]); return cached_list[8]; case 1075970048: PY_INCREF(cached_list[9]); return cached_list[9]; case 1076101120: PY_INCREF(cached_list[10]); return cached_list[10]; case -1071382528: PY_INCREF(cached_list[11]); return cached_list[11]; case -1071513600: PY_INCREF(cached_list[12]); return cached_list[12]; case -1071644672: PY_INCREF(cached_list[13]); return cached_list[13]; case -1071906816: PY_INCREF(cached_list[14]); return cached_list[14]; case -1072168960: PY_INCREF(cached_list[15]); return cached_list[15]; case -1072431104: PY_INCREF(cached_list[16]); return cached_list[16]; case -1072693248: PY_INCREF(cached_list[17]); return cached_list[17]; case -1073217536: PY_INCREF(cached_list[18]); return cached_list[18]; case -1073741824: PY_INCREF(cached_list[19]); return cached_list[19]; case -1074790400: PY_INCREF(cached_list[20]); return cached_list[20]; case -2147483648: PY_INCREF(cached_list[21]); return cached_list[21]; default: } } } /* Inline PyObject_New */ op = free_list; free_list = (PyFloatObject *)op-ob_type; PyObject_INIT(op, PyFloat_Type); op-ob_fval = fval; return (PyObject *) op; } ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Alastair Houghton schrieb: On 3 Oct 2006, at 17:47, James Y Knight wrote: On Oct 3, 2006, at 8:30 AM, Martin v. Löwis wrote: As Michael Hudson observed, this is difficult to implement, though: You can't distinguish between -0.0 and +0.0 easily, yet you should. Of course you can. It's absolutely trivial. The only part that's even *the least bit* sketchy in this is assuming that a double is 64 bits. Practically speaking, that is true on all architectures I know of, How about doing 1.0 / x, where x is the number you want to test? This is a bad idea. It may cause a trap, leading to program termination. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On Sun, Oct 01, 2006 at 02:01:51PM -0400, Jean-Paul Calderone wrote: Each line in an interactive session is compiled separately, like modules are compiled separately. With the current implementation, literals in a single compilation unit have a chance to be cached like this. Literals in different compilation units, even for the same value, don't. That makes sense - thanks for the explanation! -- Nick Craig-Wood [EMAIL PROTECTED] -- http://www.craig-wood.com/nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Nick Coghlan schrieb: Right. Although I do wonder what kind of software people write to run into this problem. As Guido points out, the numbers must be the result from some computation, or created by an extension module by different means. If people have many *simultaneous* copies of 0.0, I would expect there is something else really wrong with the data structures or algorithms they use. I suspect the problem would typically stem from floating point values that are read in from a human-readable file rather than being the result of a 'calculation' as such: That's how you can end up with 100 different copies of 0.0. But apparently, people are creating millions of them, and keep them in memory simultaneously. Unless the text file *only* consists of floating point numbers, I would expect they have bigger problems than that. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Kristján V. Jónsson schrieb: Well, a lot of extension code, like ours use PyFloat_FromDouble(foo); This can be from vectors and stuff. Hmm. If you get a lot of 0.0 values from vectors and stuff, I would expect that memory usage is already high. In any case, a module that creates a lot of copies of 0.0 that way could do its own caching, right? Very often these are values from a database. Integral float values are very common in such case and id didn't occur to me that they weren't being reused, at least for small values. Sure - but why are keeping people them in memory all the time? Also, isn't it a mis-design of the database if you have many float values in it that represent natural numbers? Shouldn't you use a more appropriate data type, then? Also, a lot of arithmetic involving floats is expected to end in integers, like computing some index from a float value. Integers get promoted to floats when touched by them, as you know. Again, sounds like a programming error to me. Anyway, I now precreate integral values from -10 to 10 with great effect. The cost is minimal, the benefit great. In an extension module, the knowledge about the application domain is larger, so it may be reasonable to do the caching there. I would still expect that in the typical application where this is an issue, there is some kind of larger design bug. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Kristján V. Jónsson schrieb: I can't see how this situation is any different from the re-use of low ints. There is no fundamental law that says that ints below 100 are more common than other, yet experience shows that this is so, and so they are reused. There are two important differences: 1. it is possible to determine whether the value is special in constant time, and also fetch the singleton value in constant time for ints; the same isn't possible for floats. 2. it may be that there is a loss of precision in reusing an existing value (although I'm not certain that this could really happen). For example, could it be that two values compare successful in ==, yet are different values? I know this can't happen for integers, so I feel much more comfortable with that cache. Rather than to view this as a programming error, why not simply accept that this is a recurring pattern and adjust python to be more efficient when faced by it? Surely a lot of karma lies that way? I'm worried about the penalty that this causes in terms of run-time cost. Also, how do you chose what values to cache? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
I see, you are thinking of the general fractional case. My point was that whole numbers seem to pop up often and to reuse those is easy I did a test of tracking actual floating point numbers and the majority of heavy usage comes from integral values. It would indeed be strange if some fractional number were heavily use but it can be argued that integral ones are special in many ways. Anyway, Skip noted that 50% of all floats are whole numbers between -10 and 10 inclusive, and this is the code that I employ in our python build today: PyObject * PyFloat_FromDouble(double fval) { register PyFloatObject *op; int ival; if (free_list == NULL) { if ((free_list = fill_free_list()) == NULL) return NULL; /* CCP addition, cache common values */ if (!f_reuse[0]) { int i; for(i = 0; i21; i++) f_reuse[i] = PyFloat_FromDouble((double)(i-10)); } } /* CCP addition, check for recycling */ ival = (int)fval; if ((double)ival == fval ival=-10 ival = 10) { ival+=10; if (f_reuse[ival]) { Py_INCREF(f_reuse[ival]); return f_reuse[ival]; } } ... Cheers, Kristján -Original Message- From: Martin v. Löwis [mailto:[EMAIL PROTECTED] Sent: 2. október 2006 14:37 To: Kristján V. Jónsson Cc: Bob Ippolito; python-dev@python.org Subject: Re: [Python-Dev] Caching float(0.0) Kristján V. Jónsson schrieb: I can't see how this situation is any different from the re-use of low ints. There is no fundamental law that says that ints below 100 are more common than other, yet experience shows that this is so, and so they are reused. There are two important differences: 1. it is possible to determine whether the value is special in constant time, and also fetch the singleton value in constant time for ints; the same isn't possible for floats. 2. it may be that there is a loss of precision in reusing an existing value (although I'm not certain that this could really happen). For example, could it be that two values compare successful in ==, yet are different values? I know this can't happen for integers, so I feel much more comfortable with that cache. Rather than to view this as a programming error, why not simply accept that this is a recurring pattern and adjust python to be more efficient when faced by it? Surely a lot of karma lies that way? I'm worried about the penalty that this causes in terms of run-time cost. Also, how do you chose what values to cache? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Martin v. Löwis [EMAIL PROTECTED] writes: Kristján V. Jónsson schrieb: I can't see how this situation is any different from the re-use of low ints. There is no fundamental law that says that ints below 100 are more common than other, yet experience shows that this is so, and so they are reused. There are two important differences: 1. it is possible to determine whether the value is special in constant time, and also fetch the singleton value in constant time for ints; the same isn't possible for floats. I don't think you mean constant time here do you? I think most of the code posted so far has been constant time, at least in terms of instruction count, though some might indeed be fairly slow on some processors -- conversion from double to integer on the PowerPC involves a trip off to memory for example. Even so, everything should be fairly efficient compared to allocation, even with PyMalloc. 2. it may be that there is a loss of precision in reusing an existing value (although I'm not certain that this could really happen). For example, could it be that two values compare successful in ==, yet are different values? I know this can't happen for integers, so I feel much more comfortable with that cache. I think the only case is that the two zeros compare equal, which is unfortunate given that it's the most compelling value to cache... I don't know a reliable and fast way to distinguish +0.0 and -0.0. Cheers, mwh -- The bottom tier is what a certain class of wanker would call business objects ... -- Greg Ward, 9 Dec 1999 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Michael Hudson schrieb: 1. it is possible to determine whether the value is special in constant time, and also fetch the singleton value in constant time for ints; the same isn't possible for floats. I don't think you mean constant time here do you? Right; I really wondered whether the code was dependent or independent of the number of special-case numbers. I think most of the code posted so far has been constant time, at least in terms of instruction count, though some might indeed be fairly slow on some processors -- conversion from double to integer on the PowerPC involves a trip off to memory for example. Kristian's code testing only for integers in a range would be of that kind. Code that tests for a list of literals determined at compile time typically needs time linear with the number of special-cased constants (of course, as that there is a fixed number of constants, this is O(1)). 2. it may be that there is a loss of precision in reusing an existing value (although I'm not certain that this could really happen). For example, could it be that two values compare successful in ==, yet are different values? I know this can't happen for integers, so I feel much more comfortable with that cache. I think the only case is that the two zeros compare equal, which is unfortunate given that it's the most compelling value to cache... Thanks for pointing that out. I can believe this is the only case in IEEE-754; I also wonder whether alternative implementations could cause problems (although I don't really worry too much about VMS). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On Mon, Oct 02, 2006, Martin v. L?wis wrote: Michael Hudson schrieb: I think most of the code posted so far has been constant time, at least in terms of instruction count, though some might indeed be fairly slow on some processors -- conversion from double to integer on the PowerPC involves a trip off to memory for example. Kristian's code testing only for integers in a range would be of that kind. Code that tests for a list of literals determined at compile time typically needs time linear with the number of special-cased constants (of course, as that there is a fixed number of constants, this is O(1)). What if we do this work only on float()? -- Aahz ([EMAIL PROTECTED]) * http://www.pythoncraft.com/ LL YR VWL R BLNG T S -- www.nancybuttons.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
[EMAIL PROTECTED] wrote:\/ Steve By these statistics I think the answer to the original question Steve is clearly no in the general case. As someone else (Guido?) pointed out, the literal case isn't all that interesting. I modified floatobject.c to track a few interesting floating point values: [...code...] So for a largely non-floating point application, a fair number of floats are allocated, a bit over 25% of them are -1.0, 0.0 or +1.0, and nearly 50% of them are whole numbers between -10.0 and 10.0, inclusive. Seems like it at least deserves a serious look. It would be nice to have the numeric crowd contribute to this subject as well. As a representative of the numeric crowd, I'll say that I've never noticed this to be a problem. I suspect that it's a non issue since we generally store our numbers in arrays, not big piles of Python floats, so there's no opportunity for identical floats to pile up. -tim ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
[EMAIL PROTECTED] wrote: Steve By these statistics I think the answer to the original question Steve is clearly no in the general case. As someone else (Guido?) pointed out, the literal case isn't all that interesting. I modified floatobject.c to track a few interesting floating point values: static unsigned int nfloats[5] = { 0, /* -1.0 */ 0, /* 0.0 */ 0, /* +1.0 */ 0, /* everything else */ 0, /* whole numbers from -10.0 ... 10.0 */ }; PyObject * PyFloat_FromDouble(double fval) { register PyFloatObject *op; if (free_list == NULL) { if ((free_list = fill_free_list()) == NULL) return NULL; } if (fval == 0.0) nfloats[1]++; else if (fval == 1.0) nfloats[2]++; else if (fval == -1.0) nfloats[0]++; else nfloats[3]++; if (fval = -10.0 fval = 10.0 (int)fval == fval) { nfloats[4]++; } This doesn't actually give us a very useful indication of potential memory savings. What I think would be more useful is tracking the maximum simultaneous count of each value i.e. what the maximum refcount would have been if they were shared. Tim Delaney ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Kristján V. Jónsson [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Anyway, Skip noted that 50% of all floats are whole numbers between -10 and 10 inclusive, Please, no. He said something like this about *non-floating-point applications* (evidence unspecified, that I remember). But such applications, by definition, usually don't have enough floats for caching (or conversion time) to matter too much. For true floating point measurements (of temperature, for instance), 'integral' measurements (which are an artifact of the scale used (degrees F versus C versus K)) should generally be no more common than other realized measurements. Thirty years ago, a major stat package written in Fortran (BMDP) required that all data be stored as (Fortran 4-byte) floats for analysis. So a column of yes/no or male/female data would be stored as 0.0/1.0 or perhaps 1.0/2.0. That skewed the distribution of floats. But Python and, I hope, Python apps, are more modern than that. and this is the code that I employ in our python build today: [snip] For the analysis of typical floating point data, this is all pointless and a complete waste of time. After a billion comversions or so, I expect the extra time might add up to something noticeable. From: Martin v. Löwis [mailto:[EMAIL PROTECTED] I'm worried about the penalty that this causes in terms of run-time cost. Me too. Also, how do you chose what values to cache? At one time (don't know about today), it was mandatory in some Fortran circles to name the small float constants used in a particular program with the equivalent of C #defines. In Python, zero = 0.0, half = 0.5, one = 1.0, twopi = 6.29..., eee = 2.7..., phi = .617..., etc. (Note that naming is not restricted to integral or otherwise 'nice' values.) The purpose then was to allow easy conversion from float to double to extended double. And in some cases, it also made the code clearer. With Python, the same procedure would guarantee only one copy (caching) of the same floats for constructed data structures. Float caching strikes me a a good subject for cookbook recipies, but not, without real data and a willingness to slightly screw some users, for the default core code. Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Tim This doesn't actually give us a very useful indication of potential Tim memory savings. What I think would be more useful is tracking the Tim maximum simultaneous count of each value i.e. what the maximum Tim refcount would have been if they were shared. Most definitely. I just posted what I came up with in about two minutes. I'll add some code to track the high water mark as well and report back. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Terry Kristján V. Jónsson [EMAIL PROTECTED] wrote: Anyway, Skip noted that 50% of all floats are whole numbers between -10 and 10 inclusive, Terry Please, no. He said something like this about Terry *non-floating-point applications* (evidence unspecified, that I Terry remember). But such applications, by definition, usually don't Terry have enough floats for caching (or conversion time) to matter too Terry much. Correct. The non-floating-point application I chose was the one that was most immediately available, make test. Note that I have no proof that regrtest.py isn't terribly floating point intensive. I just sort of guessed that it was. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
skip Most definitely. I just posted what I came up with in about two skip minutes. I'll add some code to track the high water mark as well skip and report back. Using the smallest change I could get away with, I came up with these allocation figures (same as before): -1.0: 29048 0.0: 524340 +1.0: 91560 rest: 1753479 whole numbers -10.0 to 10.0: 1151543 and these max ref counts: -1.0: 16 0.0: 136 +1.0: 161 rest: 1 whole numbers -10.0 to 10.0: 161 When I have a couple more minutes I'll just implement a cache for whole numbers between -10.0 and 10.0 and test that whole range of values right. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On Fri, Sep 29, 2006 at 12:03:03PM -0700, Guido van Rossum wrote: I see some confusion in this thread. If a *LITERAL* 0.0 (or any other float literal) is used, you only get one object, no matter how many times it is used. For some reason that doesn't happen in the interpreter which has been confusing the issue slightly... $ python2.5 Python 2.5c1 (r25c1:51305, Aug 19 2006, 18:23:29) [GCC 4.1.2 20060814 (prerelease) (Debian 4.1.1-11)] on linux2 Type help, copyright, credits or license for more information. a=0.0 b=0.0 id(a), id(b) (134737756, 134737772) $ python2.5 -c 'a=0.0; b=0.0; print id(a),id(b)' 134737796 134737796 But if the result of a *COMPUTATION* returns 0.0, you get a new object for each such result. If you have 70 MB worth of zeros, that's clearly computation results, not literals. In my application I'm receiving all the zeros from a server over TCP as ASCII and these are being float()ed in python. -- Nick Craig-Wood [EMAIL PROTECTED] -- http://www.craig-wood.com/nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On Sat, Sep 30, 2006 at 03:21:50PM -0700, Bob Ippolito wrote: On 9/30/06, Terry Reedy [EMAIL PROTECTED] wrote: Nick Coghlan [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] I suspect the problem would typically stem from floating point values that are read in from a human-readable file rather than being the result of a 'calculation' as such: Over a TCP socket in ASCII format for my application For such situations, one could create a translation dict for both common float values and for non-numeric missing value indicators. For instance, flotran = {'*': None, '1.0':1.0, '2.0':2.0, '4.0':4.0} The details, of course, depend on the specific case. But of course you have to know that common float values are never cached and that it may cause you problems. Some users may expect them to be because common strings and integers are cached. I have to say I was surprised to find out how many copies of 0.0 there were in my code and I guess I was subconsciously expecting the immutable 0.0s to be cached even though I know consciously I've never seen anything but int and str mentioned in the docs. -- Nick Craig-Wood [EMAIL PROTECTED] -- http://www.craig-wood.com/nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Nick Craig-Wood [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] On Fri, Sep 29, 2006 at 12:03:03PM -0700, Guido van Rossum wrote: I see some confusion in this thread. If a *LITERAL* 0.0 (or any other float literal) is used, you only get one object, no matter how many times it is used. For some reason that doesn't happen in the interpreter which has been confusing the issue slightly... $ python2.5 a=0.0 b=0.0 id(a), id(b) (134737756, 134737772) Guido said *a* literal (emphasis shifted), reused as in a loop or function recalled, while you used *a* literal, then *another* literal, without reuse. Try a=b=0.0 instead. Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On Sun, 1 Oct 2006 13:54:31 -0400, Terry Reedy [EMAIL PROTECTED] wrote: Nick Craig-Wood [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] On Fri, Sep 29, 2006 at 12:03:03PM -0700, Guido van Rossum wrote: I see some confusion in this thread. If a *LITERAL* 0.0 (or any other float literal) is used, you only get one object, no matter how many times it is used. For some reason that doesn't happen in the interpreter which has been confusing the issue slightly... $ python2.5 a=0.0 b=0.0 id(a), id(b) (134737756, 134737772) Guido said *a* literal (emphasis shifted), reused as in a loop or function recalled, while you used *a* literal, then *another* literal, without reuse. Try a=b=0.0 instead. Actually this just has to do with, um, compilation units, for lack of a better term: [EMAIL PROTECTED]:~$ python Python 2.4.3 (#2, Apr 27 2006, 14:43:58) [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2 Type help, copyright, credits or license for more information. a = 0.0 b = 0.0 print a is b False ^D [EMAIL PROTECTED]:~$ cat test.py a = 0.0 b = 0.0 print a is b ^D [EMAIL PROTECTED]:~$ python test.py True [EMAIL PROTECTED]:~$ cat test_a.py a = 0.0 ^D [EMAIL PROTECTED]:~$ cat test_b.py b = 0.0 ^D [EMAIL PROTECTED]:~$ cat test.py from test_a import a from test_b import b print a is b ^D [EMAIL PROTECTED]:~$ python test.py False [EMAIL PROTECTED]:~$ python Python 2.4.3 (#2, Apr 27 2006, 14:43:58) [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2 Type help, copyright, credits or license for more information. a = 0.0; b = 0.0 print a is b True [EMAIL PROTECTED]:~$ Each line in an interactive session is compiled separately, like modules are compiled separately. With the current implementation, literals in a single compilation unit have a chance to be cached like this. Literals in different compilation units, even for the same value, don't. Jean-Paul ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Jason Orendorff wrote: On 9/29/06, Fredrik Lundh [EMAIL PROTECTED] wrote: (I just checked the program I'm working on, and my analysis tells me that the most common floating point value in that program is 121.216, which occurs 32 times. from what I can tell, 0.0 isn't used at all.) *bemused look* Fredrik, can you share the reason why this number occurs 32 times in this program? I don't mean to imply anything by that; it just sounds like it might be a fun story. :) Anyway, this kind of static analysis is probably more entertaining than relevant. For your enjoyment, the most-used float literals in python25\Lib, omitting test directories, are: 1e-006: 5 hits 4.0: 6 hits 0.05: 7 hits 6.0: 8 hits 0.5: 13 hits 2.0: 25 hits 0.0: 36 hits 1.0: 62 hits There are two hits each for -1.0 and -0.5. In my own Python code, I don't even have enough float literals to bother with. By these statistics I think the answer to the original question is clearly no in the general case. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Bob Ippolito schrieb: My guess is that people do have this problem, they just don't know where that memory has gone. I know I don't count objects unless I have a process that's leaking memory or it grows so big that I notice (by swapping or chance). Right. Although I do wonder what kind of software people write to run into this problem. As Guido points out, the numbers must be the result from some computation, or created by an extension module by different means. If people have many *simultaneous* copies of 0.0, I would expect there is something else really wrong with the data structures or algorithms they use. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Martin v. Löwis wrote: Bob Ippolito schrieb: My guess is that people do have this problem, they just don't know where that memory has gone. I know I don't count objects unless I have a process that's leaking memory or it grows so big that I notice (by swapping or chance). Right. Although I do wonder what kind of software people write to run into this problem. As Guido points out, the numbers must be the result from some computation, or created by an extension module by different means. If people have many *simultaneous* copies of 0.0, I would expect there is something else really wrong with the data structures or algorithms they use. I suspect the problem would typically stem from floating point values that are read in from a human-readable file rather than being the result of a 'calculation' as such: float('1') is float('1') False float('0') is float('0') False Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Well, a lot of extension code, like ours use PyFloat_FromDouble(foo); This can be from vectors and stuff. Very often these are values from a database. Integral float values are very common in such case and id didn't occur to me that they weren't being reused, at least for small values. Also, a lot of arithmetic involving floats is expected to end in integers, like computing some index from a float value. Integers get promoted to floats when touched by them, as you know. Anyway, I now precreate integral values from -10 to 10 with great effect. The cost is minimal, the benefit great. Cheers, Kristján -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin v. Löwis Sent: 30. september 2006 08:48 To: Bob Ippolito Cc: python-dev@python.org Subject: Re: [Python-Dev] Caching float(0.0) Bob Ippolito schrieb: My guess is that people do have this problem, they just don't know where that memory has gone. I know I don't count objects unless I have a process that's leaking memory or it grows so big that I notice (by swapping or chance). Right. Although I do wonder what kind of software people write to run into this problem. As Guido points out, the numbers must be the result from some computation, or created by an extension module by different means. If people have many *simultaneous* copies of 0.0, I would expect there is something else really wrong with the data structures or algorithms they use. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/kristjan%40ccpgames.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Nick Coghlan [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] I suspect the problem would typically stem from floating point values that are read in from a human-readable file rather than being the result of a 'calculation' as such: For such situations, one could create a translation dict for both common float values and for non-numeric missing value indicators. For instance, flotran = {'*': None, '1.0':1.0, '2.0':2.0, '4.0':4.0} The details, of course, depend on the specific case. tjr ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On 9/30/06, Terry Reedy [EMAIL PROTECTED] wrote: Nick Coghlan [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] I suspect the problem would typically stem from floating point values that are read in from a human-readable file rather than being the result of a 'calculation' as such: For such situations, one could create a translation dict for both common float values and for non-numeric missing value indicators. For instance, flotran = {'*': None, '1.0':1.0, '2.0':2.0, '4.0':4.0} The details, of course, depend on the specific case. But of course you have to know that common float values are never cached and that it may cause you problems. Some users may expect them to be because common strings and integers are cached. -bob ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Steve By these statistics I think the answer to the original question Steve is clearly no in the general case. As someone else (Guido?) pointed out, the literal case isn't all that interesting. I modified floatobject.c to track a few interesting floating point values: static unsigned int nfloats[5] = { 0, /* -1.0 */ 0, /* 0.0 */ 0, /* +1.0 */ 0, /* everything else */ 0, /* whole numbers from -10.0 ... 10.0 */ }; PyObject * PyFloat_FromDouble(double fval) { register PyFloatObject *op; if (free_list == NULL) { if ((free_list = fill_free_list()) == NULL) return NULL; } if (fval == 0.0) nfloats[1]++; else if (fval == 1.0) nfloats[2]++; else if (fval == -1.0) nfloats[0]++; else nfloats[3]++; if (fval = -10.0 fval = 10.0 (int)fval == fval) { nfloats[4]++; } /* Inline PyObject_New */ op = free_list; free_list = (PyFloatObject *)op-ob_type; PyObject_INIT(op, PyFloat_Type); op-ob_fval = fval; return (PyObject *) op; } static void _count_float_allocations(void) { fprintf(stderr, -1.0: %d\n, nfloats[0]); fprintf(stderr, 0.0: %d\n, nfloats[1]); fprintf(stderr, +1.0: %d\n, nfloats[2]); fprintf(stderr, rest: %d\n, nfloats[3]); fprintf(stderr, whole numbers -10.0 to 10.0: %d\n, nfloats[4]); } then called atexit(_count_float_allocations) in _PyFloat_Init and ran make test. The output was: ... ./python.exe -E -tt ../Lib/test/regrtest.py -l ... -1.0: 29048 0.0: 524241 +1.0: 91561 rest: 1749807 whole numbers -10.0 to 10.0: 1151442 So for a largely non-floating point application, a fair number of floats are allocated, a bit over 25% of them are -1.0, 0.0 or +1.0, and nearly 50% of them are whole numbers between -10.0 and 10.0, inclusive. Seems like it at least deserves a serious look. It would be nice to have the numeric crowd contribute to this subject as well. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Nick Craig-Wood wrote: Is there any reason why float() shouldn't cache the value of 0.0 since it is by far and away the most common value? says who ? (I just checked the program I'm working on, and my analysis tells me that the most common floating point value in that program is 121.216, which occurs 32 times. from what I can tell, 0.0 isn't used at all.) /F ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Acting on this excellent advice, I have patched in a reuse for -1.0, 0.0 and 1.0 for EVE Online. We use vectors and stuff a lot, and 0.0 is very, very common. I'll report on the refcount of this for you shortly. K -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Fredrik Lundh Sent: 29. september 2006 15:11 To: python-dev@python.org Subject: Re: [Python-Dev] Caching float(0.0) Nick Craig-Wood wrote: Is there any reason why float() shouldn't cache the value of 0.0 since it is by far and away the most common value? says who ? (I just checked the program I'm working on, and my analysis tells me that the most common floating point value in that program is 121.216, which occurs 32 times. from what I can tell, 0.0 isn't used at all.) /F ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/kristjan%40c cpgames.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Well gentlemen, I did gather some stats on the frequency of PyFloat_FromDouble(). out of the 1000 first different floats allocated, we get this frequency distribution once our server has started up: - stats [1000]({v=0.0 c=410612 },{v=1. c=107838 },{v=0.75000 c=25487 },{v=5. c=22557 },...) std::vectorentry,std::allocatorentry + [0] {v=0.0 c=410612 } entry + [1] {v=1. c=107838 }entry + [2] {v=0.75000 c=25487 }entry + [3] {v=5. c=22557 } entry + [4] {v=1. c=18530 } entry + [5] {v=-1. c=14950 }entry + [6] {v=2. c=14460 } entry + [7] {v=1500.0 c=13470 } entry + [8] {v=100.00 c=11913 } entry + [9] {v=0.5 c=11497 }entry + [10]{v=3. c=9833 } entry + [11]{v=20.000 c=9019 } entry + [12]{v=0.90002 c=8954 } entry + [13]{v=10.000 c=8377 } entry + [14]{v=4. c=7890 } entry + [15]{v=0.050003 c=7732 }entry + [16]{v=1000.0 c=7456 } entry + [17]{v=0.40002 c=7427 } entry + [18]{v=-100.00 c=7071 } entry + [19]{v=5000.0 c=6851 } entry + [20]{v=100.00 c=6503 } entry + [21]{v=0.070007 c=6071 }entry (here I omit the rest). In addition, my shared 0.0 double has some 20 references at this point. 0.0 is very, very common. The same can be said about all the integers up to 5.0 as well as -1.0 I think I will add a simple cache for these values for Eve. something like: int i = (int) fval; if ((double)i == fval i=-1 i6) { Py_INCREF(table[i]); return table[i]; } Cheers, Kristján -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kristján V. Jónsson Sent: 29. september 2006 15:18 To: Fredrik Lundh; python-dev@python.org Subject: Re: [Python-Dev] Caching float(0.0) Acting on this excellent advice, I have patched in a reuse for -1.0, 0.0 and 1.0 for EVE Online. We use vectors and stuff a lot, and 0.0 is very, very common. I'll report on the refcount of this for you shortly. K -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Fredrik Lundh Sent: 29. september 2006 15:11 To: python-dev@python.org Subject: Re: [Python-Dev] Caching float(0.0) Nick Craig-Wood wrote: Is there any reason why float() shouldn't cache the value of 0.0 since it is by far and away the most common value? says who ? (I just checked the program I'm working on, and my analysis tells me that the most common floating point value in that program is 121.216, which occurs 32 times. from what I can tell, 0.0 isn't used at all.) /F ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/kristjan%40c cpgames.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/kristjan%40c cpgames.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On 9/29/06, Fredrik Lundh [EMAIL PROTECTED] wrote: (I just checked the program I'm working on, and my analysis tells me that the most common floating point value in that program is 121.216, which occurs 32 times. from what I can tell, 0.0 isn't used at all.) *bemused look* Fredrik, can you share the reason why this number occurs 32 times in this program? I don't mean to imply anything by that; it just sounds like it might be a fun story. :) Anyway, this kind of static analysis is probably more entertaining than relevant. For your enjoyment, the most-used float literals in python25\Lib, omitting test directories, are: 1e-006: 5 hits 4.0: 6 hits 0.05: 7 hits 6.0: 8 hits 0.5: 13 hits 2.0: 25 hits 0.0: 36 hits 1.0: 62 hits There are two hits each for -1.0 and -0.5. In my own Python code, I don't even have enough float literals to bother with. -j ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Jason Orendorff [EMAIL PROTECTED] wrote: Anyway, this kind of static analysis is probably more entertaining than relevant. ... Well, yes. One can tell that by the piffling little counts being bandied about! More seriously, yes, it is Well Known that 0.0 is the Most Common Floating-Point Number is most numerical codes; a lot of older (and perhaps modern) sparse matrix algorithms use that to save space. In the software floating-point that I have started to draft some example code but have had to shelve (no, I haven't forgotten) the values I predefine are Invalid, Missing, True Zero and Approximate Zero. The infinities and infinitesimals (a.k.a. signed zeroes) could also be included, but are less common and more complicated. And so could common integers and fractions. It is generally NOT worth doing a cache lookup for genuinely numerical code, as the common cases that are not the above rarely account for enough of the numbers to be worth it. I did a fair amount of investigation looking for compressibility at one time, and that conclusion jumped out at me. The exact best choice depends entirely on what you are doing. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761Fax: +44 1223 334679 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
I see some confusion in this thread. If a *LITERAL* 0.0 (or any other float literal) is used, you only get one object, no matter how many times it is used. But if the result of a *COMPUTATION* returns 0.0, you get a new object for each such result. If you have 70 MB worth of zeros, that's clearly computation results, not literals. Attempts to remove literal references from source code won't help much. I'm personally +0 on caching computational results with common float values such as 0 and small (positive or negative) powers of two, e.g. 0.5, 1.0, 2.0. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
Nick Craig-Wood wrote: Is there any reason why float() shouldn't cache the value of 0.0 since it is by far and away the most common value? 1.0 might be another candidate for cacheing. Although the fact that nobody has complained about this before suggests that it might not be a frequent enough problem to be worth the effort. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Caching float(0.0)
On 9/29/06, Greg Ewing [EMAIL PROTECTED] wrote: Nick Craig-Wood wrote: Is there any reason why float() shouldn't cache the value of 0.0 since it is by far and away the most common value? 1.0 might be another candidate for cacheing. Although the fact that nobody has complained about this before suggests that it might not be a frequent enough problem to be worth the effort. My guess is that people do have this problem, they just don't know where that memory has gone. I know I don't count objects unless I have a process that's leaking memory or it grows so big that I notice (by swapping or chance). That said, I've never noticed this particular issue.. but I deal with mostly strings. I have had issues with the allocator a few times that I had to work around, but not this sort of issue. -bob ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com