Re: [Numpy-discussion] Bug in numpy.mean() revisited
On Thu, 2012-07-26 at 22:15 -0600, Charles R Harris wrote: I would support accumulating in 64 bits but, IIRC, the function will need to be rewritten so that it works by adding 32 bit floats to the accumulator to save space. There are also more stable methods that could also be investigated. There is a nice little project there for someone to cut their teeth on. So a (very) quick read around suggests that using an interim mean gives a more robust algorithm. The problem being, that these techniques are either multi-pass, or inherently slower (due to say a division in the loop). Higher precision would not suffer the same potential slow down and would solve most cases of this problem. Henry ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in numpy.mean() revisited
On Fri, Jul 27, 2012 at 5:15 AM, Charles R Harris charlesr.har...@gmail.com wrote: I would support accumulating in 64 bits but, IIRC, the function will need to be rewritten so that it works by adding 32 bit floats to the accumulator to save space. There are also more stable methods that could also be investigated. There is a nice little project there for someone to cut their teeth on. So the obvious solution here would be to make the ufunc reduce loop smart enough that x = np.zeros(2 ** 30, dtype=float32) np.sum(x, dtype=float64) does not upcast 'x' to float64's as a whole. This shouldn't be too terrible to implement -- iterate over the float32 array, and only upcast each inner-loop buffer as you go, instead of upcasting the whole thing. In fact, nditer might do this already? Then using a wide accumulator by default would just take a few lines of code in numpy.core._methods._mean to select the proper dtype and downcast the result. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bug in numpy.mean() revisited
There was a thread in January discussing the non-obvious behavior of numpy.mean() for large arrays of float32 values [1]. This issue is nicely discussed at the end of the numpy.mean() documentation [2] with an example: a = np.zeros((2, 512*512), dtype=np.float32) a[0, :] = 1.0 a[1, :] = 0.1 np.mean(a) 0.546875 From the docs and previous discussion it seems there is no technical difficulty in choosing a different (higher precision) type for the accumulator using the dtype arg, and in fact this is done automatically for int values. My question is whether there would be any support for doing something more than documenting this behavior. I suspect very few people ever make it below the fold for the np.mean() documentation. Taking the mean of large arrays of float32 values is a *very* common use case and giving the wrong answer with default inputs is really disturbing. I recently had to rebuild a complex science data archive because of corrupted mean values. Possible ideas to stimulate discussion: 1. Always use float64 to accumulate float types that are 64 bits or less. Are there serious performance impacts to automatically using float64 to accumulate float32 arrays? I appreciate this would likely introduce unwanted regressions (sometimes suddenly getting the right answer is a bad thing). So could this be considered for numpy 2.0? 2. Might there be a way to emit a warning if the number of values and the max accumulated value [3] are such that the estimated fractional error is above some tolerance? I'm not even sure if this is a good idea or if there will be howls from the community as their codes start warning about inaccurate mean values. Better idea along this line?? Cheers, Tom [1]: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059960.html [2]: http://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html [3]: Using the max accumulated value during accumulation instead of the final accumulated value seems like the right thing for estimating precision loss. But this would affect performance so maybe just using the final value would catch many cases. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in numpy.mean() revisited
On Thu, Jul 26, 2012 at 9:26 PM, Tom Aldcroft aldcr...@head.cfa.harvard.edu wrote: There was a thread in January discussing the non-obvious behavior of numpy.mean() for large arrays of float32 values [1]. This issue is nicely discussed at the end of the numpy.mean() documentation [2] with an example: a = np.zeros((2, 512*512), dtype=np.float32) a[0, :] = 1.0 a[1, :] = 0.1 np.mean(a) 0.546875 From the docs and previous discussion it seems there is no technical difficulty in choosing a different (higher precision) type for the accumulator using the dtype arg, and in fact this is done automatically for int values. My question is whether there would be any support for doing something more than documenting this behavior. I suspect very few people ever make it below the fold for the np.mean() documentation. Taking the mean of large arrays of float32 values is a *very* common use case and giving the wrong answer with default inputs is really disturbing. I recently had to rebuild a complex science data archive because of corrupted mean values. Possible ideas to stimulate discussion: 1. Always use float64 to accumulate float types that are 64 bits or less. Are there serious performance impacts to automatically using float64 to accumulate float32 arrays? I appreciate this would likely introduce unwanted regressions (sometimes suddenly getting the right answer is a bad thing). So could this be considered for numpy 2.0? 2. Might there be a way to emit a warning if the number of values and the max accumulated value [3] are such that the estimated fractional error is above some tolerance? I'm not even sure if this is a good idea or if there will be howls from the community as their codes start warning about inaccurate mean values. Better idea along this line?? I would support accumulating in 64 bits but, IIRC, the function will need to be rewritten so that it works by adding 32 bit floats to the accumulator to save space. There are also more stable methods that could also be investigated. There is a nice little project there for someone to cut their teeth on. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] bug in numpy.mean() ?
I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be: 2-D data file here: http://dl.dropbox.com/u/139035/data.npy Then: In [3]: data.mean() Out[3]: 3067.024383998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.shape Out[5]: (1000, 1000) In [6]: data.min() Out[6]: 3040.498 In [7]: data.dtype Out[7]: dtype('float32') A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results. Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD. Best regards, Michael ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
On 01/24/2012 12:33 PM, K.-Michael Aye wrote: I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be: 2-D data file here: http://dl.dropbox.com/u/139035/data.npy Then: In [3]: data.mean() Out[3]: 3067.024383998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.shape Out[5]: (1000, 1000) In [6]: data.min() Out[6]: 3040.498 In [7]: data.dtype Out[7]: dtype('float32') A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results. Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD. Best regards, Michael ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion You have a million 32-bit floating point numbers that are in the thousands. Thus you are exceeding the 32-bitfloat precision and, if you can, you need to increase precision of the accumulator in np.mean() or change the input dtype: a.mean(dtype=np.float32) # default and lacks precision 3067.024383998 a.mean(dtype=np.float64) 3045.747251076416 a.mean(dtype=np.float128) 3045.7472510764160156 b=a.astype(np.float128) b.mean() 3045.7472510764160156 Otherwise you are left to using some alternative approach to calculate the mean. Bruce ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
I have confirmed this on a 64-bit linux machine running python 2.7.2 with the development version of numpy. It seems to be related to using float32 instead of float64. If the array is first converted to a 64-bit float (via astype), mean gives an answer that agrees with your looped-calculation value: 3045.747250002. With the original 32-bit array, averaging successively on one axis and then on the other gives answers that agree with the 64-bit float answer to the second decimal place. In [125]: d = np.load('data.npy') In [126]: d.mean() Out[126]: 3067.024383998 In [127]: d64 = d.astype('float64') In [128]: d64.mean() Out[128]: 3045.747251076416 In [129]: d.mean(axis=0).mean() Out[129]: 3045.748750002 In [130]: d.mean(axis=1).mean() Out[130]: 3045.74448 In [131]: np.version.full_version Out[131]: '2.0.0.dev-55472ca' -- On Tue, 2012-01-24 at 12:33 -0600, K.-MichaelA wrote: I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be: 2-D data file here: http://dl.dropbox.com/u/139035/data.npy Then: In [3]: data.mean() Out[3]: 3067.024383998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.shape Out[5]: (1000, 1000) In [6]: data.min() Out[6]: 3040.498 In [7]: data.dtype Out[7]: dtype('float32') A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results. Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD. Best regards, Michael ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- -- Kathleen M. Tacina NASA Glenn Research Center MS 5-10 21000 Brookpark Road Cleveland, OH 44135 Telephone: (216) 433-6660 Fax: (216) 433-5802 -- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
On Jan 24, 2012, at 1:33 PM, K.-Michael Aye wrote: I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be: 2-D data file here: http://dl.dropbox.com/u/139035/data.npy Then: In [3]: data.mean() Out[3]: 3067.024383998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.shape Out[5]: (1000, 1000) In [6]: data.min() Out[6]: 3040.498 In [7]: data.dtype Out[7]: dtype('float32') A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results. Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD. I get the same result: In [1]: import numpy In [2]: data = numpy.load('data.npy') In [3]: data.mean() Out[3]: 3067.024383998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.min() Out[5]: 3040.498 In [6]: numpy.version.version Out[6]: '2.0.0.dev-433b02a' This on OS X 10.7.2 with Python 2.7.1, on an intel Core i7. Running python as a 32 vs. 64-bit process doesn't make a difference. The data matrix doesn't look too strange when I view it as an image -- all pretty smooth variation around the (min, max) range. But maybe it's still somehow floating-point pathological? This is fun too: In [12]: data.mean() Out[12]: 3067.024383998 In [13]: (data/3000).mean()*3000 Out[13]: 3020.807437501 In [15]: (data/2).mean()*2 Out[15]: 3067.024383998 In [16]: (data/200).mean()*200 Out[16]: 3013.67541 Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
Just what Bruce said. You can run the following to confirm: np.mean(data - data.mean()) If for some reason you do not want to convert to float64 you can add the result of the previous line to the bad mean: bad_mean = data.mean() good_mean = bad_mean + np.mean(data - bad_mean) Val On Tue, Jan 24, 2012 at 12:33 PM, K.-Michael Aye kmichael@gmail.comwrote: I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be: 2-D data file here: http://dl.dropbox.com/u/139035/data.npy Then: In [3]: data.mean() Out[3]: 3067.024383998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.shape Out[5]: (1000, 1000) In [6]: data.min() Out[6]: 3040.498 In [7]: data.dtype Out[7]: dtype('float32') A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results. Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD. Best regards, Michael ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
You have a million 32-bit floating point numbers that are in the thousands. Thus you are exceeding the 32-bitfloat precision and, if you can, you need to increase precision of the accumulator in np.mean() or change the input dtype: a.mean(dtype=np.float32) # default and lacks precision 3067.024383998 a.mean(dtype=np.float64) 3045.747251076416 a.mean(dtype=np.float128) 3045.7472510764160156 b=a.astype(np.float128) b.mean() 3045.7472510764160156 Otherwise you are left to using some alternative approach to calculate the mean. Bruce Interesting -- I knew that float64 accumulators were used with integer arrays, and I had just assumed that 64-bit or higher accumulators would be used with floating-point arrays too, instead of the array's dtype. This is actually quite a bit of a gotcha for floating-point imaging-type tasks -- good to know! Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
Thank you Bruce and all, I knew I was doing something wrong (should have read the mean method doc more closely). Am of course glad that's so easy understandable. But: If the error can get so big, wouldn't it be a better idea for the accumulator to always be of type 'float64' and then convert later to the type of the original array? As one can see in this case, the result would be much closer to the true value. Michael On 2012-01-24 19:01:40 +, Val Kalatsky said: Just what Bruce said. You can run the following to confirm: np.mean(data - data.mean()) If for some reason you do not want to convert to float64 you can add the result of the previous line to the bad mean: bad_mean = data.mean() good_mean = bad_mean + np.mean(data - bad_mean) Val On Tue, Jan 24, 2012 at 12:33 PM, K.-Michael Aye kmichael@gmail.com wrote: I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be: 2-D data file here: http://dl.dropbox.com/u/139035/data.npy Then: In [3]: data.mean() Out[3]: 3067.024383998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.shape Out[5]: (1000, 1000) In [6]: data.min() Out[6]: 3040.498 In [7]: data.dtype Out[7]: dtype('float32') A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results. Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD. Best regards, Michael ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
Hi, Oddly, but numpy 1.6 seems to behave more consistent manner: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: d= np.load('data.npy') In []: d.dtype Out[]: dtype('float32') In []: d.mean() Out[]: 3045.74718 In []: d.mean(dtype= np.float32) Out[]: 3045.74718 In []: d.mean(dtype= np.float64) Out[]: 3045.747251076416 In []: (d- d.min()).mean()+ d.min() Out[]: 3045.7472508750002 In []: d.mean(axis= 0).mean() Out[]: 3045.74724 In []: d.mean(axis= 1).mean() Out[]: 3045.74724 Or does the results of calculations depend more on the platform? My 2 cents, eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
I found something similar, with a very simple example. On 64-bit linux, python 2.7.2, numpy development version: In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) In [23]: a.mean() Out[23]: 4034.16357421875 In [24]: np.version.full_version Out[24]: '2.0.0.dev-55472ca' But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives: a = np.ones((1024,1024),dtype=np.float32) a.mean() 4000.0 np.version.full_version '1.6.1' On Tue, 2012-01-24 at 17:12 -0600, eat wrote: Hi, Oddly, but numpy 1.6 seems to behave more consistent manner: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: d= np.load('data.npy') In []: d.dtype Out[]: dtype('float32') In []: d.mean() Out[]: 3045.74718 In []: d.mean(dtype= np.float32) Out[]: 3045.74718 In []: d.mean(dtype= np.float64) Out[]: 3045.747251076416 In []: (d- d.min()).mean()+ d.min() Out[]: 3045.7472508750002 In []: d.mean(axis= 0).mean() Out[]: 3045.74724 In []: d.mean(axis= 1).mean() Out[]: 3045.74724 Or does the results of calculations depend more on the platform? My 2 cents, eat -- -- Kathleen M. Tacina NASA Glenn Research Center MS 5-10 21000 Brookpark Road Cleveland, OH 44135 Telephone: (216) 433-6660 Fax: (216) 433-5802 -- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
Hi On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina kathleen.m.tac...@nasa.gov wrote: ** I found something similar, with a very simple example. On 64-bit linux, python 2.7.2, numpy development version: In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) In [23]: a.mean() Out[23]: 4034.16357421875 In [24]: np.version.full_version Out[24]: '2.0.0.dev-55472ca' But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives: a = np.ones((1024,1024),dtype=np.float32) a.mean() 4000.0 np.version.full_version '1.6.1' This indeed looks very nasty, regardless of whether it is a version or platform related problem. -eat On Tue, 2012-01-24 at 17:12 -0600, eat wrote: Hi, Oddly, but numpy 1.6 seems to behave more consistent manner: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: d= np.load('data.npy') In []: d.dtype Out[]: dtype('float32') In []: d.mean() Out[]: 3045.74718 In []: d.mean(dtype= np.float32) Out[]: 3045.74718 In []: d.mean(dtype= np.float64) Out[]: 3045.747251076416 In []: (d- d.min()).mean()+ d.min() Out[]: 3045.7472508750002 In []: d.mean(axis= 0).mean() Out[]: 3045.74724 In []: d.mean(axis= 1).mean() Out[]: 3045.74724 Or does the results of calculations depend more on the platform? My 2 cents, eat -- -- Kathleen M. Tacina NASA Glenn Research Center MS 5-10 21000 Brookpark Road Cleveland, OH 44135 Telephone: (216) 433-6660 Fax: (216) 433-5802 -- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
On Tue, Jan 24, 2012 at 7:21 PM, eat e.antero.ta...@gmail.com wrote: Hi On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina kathleen.m.tac...@nasa.gov wrote: ** I found something similar, with a very simple example. On 64-bit linux, python 2.7.2, numpy development version: In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) In [23]: a.mean() Out[23]: 4034.16357421875 In [24]: np.version.full_version Out[24]: '2.0.0.dev-55472ca' But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives: a = np.ones((1024,1024),dtype=np.float32) a.mean() 4000.0 np.version.full_version '1.6.1' This indeed looks very nasty, regardless of whether it is a version or platform related problem. Looks like platform specific, same result as -eat Windows 7, Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32 a = np.ones((1024,1024),dtype=np.float32) a.mean() 1.0 (4000*a).dtype dtype('float32') (4000*a).mean() 4000.0 b = np.load(data.npy) b.mean() 3045.74718 b.shape (1000, 1000) b.mean(0).mean(0) 3045.74724 _.dtype dtype('float64') b.dtype dtype('float32') b.mean(dtype=np.float32) 3045.74718 Josef -eat On Tue, 2012-01-24 at 17:12 -0600, eat wrote: Hi, Oddly, but numpy 1.6 seems to behave more consistent manner: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: d= np.load('data.npy') In []: d.dtype Out[]: dtype('float32') In []: d.mean() Out[]: 3045.74718 In []: d.mean(dtype= np.float32) Out[]: 3045.74718 In []: d.mean(dtype= np.float64) Out[]: 3045.747251076416 In []: (d- d.min()).mean()+ d.min() Out[]: 3045.7472508750002 In []: d.mean(axis= 0).mean() Out[]: 3045.74724 In []: d.mean(axis= 1).mean() Out[]: 3045.74724 Or does the results of calculations depend more on the platform? My 2 cents, eat -- -- Kathleen M. Tacina NASA Glenn Research Center MS 5-10 21000 Brookpark Road Cleveland, OH 44135 Telephone: (216) 433-6660 Fax: (216) 433-5802 -- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
On Tue, Jan 24, 2012 at 4:21 PM, Kathleen M Tacina kathleen.m.tac...@nasa.gov wrote: ** I found something similar, with a very simple example. On 64-bit linux, python 2.7.2, numpy development version: In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) In [23]: a.mean() Out[23]: 4034.16357421875 In [24]: np.version.full_version Out[24]: '2.0.0.dev-55472ca' But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives: a = np.ones((1024,1024),dtype=np.float32) a.mean() 4000.0 np.version.full_version '1.6.1' Yes, the results are platform/compiler dependent. The 32 bit platforms tend to use extended precision accumulators and the x87 instruction set. The 64 bit platforms tend to use sse2+. Different precisions, even though you might think they are the same. snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
On Wed, Jan 25, 2012 at 12:03 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Jan 24, 2012 at 4:21 PM, Kathleen M Tacina kathleen.m.tac...@nasa.gov wrote: I found something similar, with a very simple example. On 64-bit linux, python 2.7.2, numpy development version: In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) In [23]: a.mean() Out[23]: 4034.16357421875 In [24]: np.version.full_version Out[24]: '2.0.0.dev-55472ca' But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives: a = np.ones((1024,1024),dtype=np.float32) a.mean() 4000.0 np.version.full_version '1.6.1' Yes, the results are platform/compiler dependent. The 32 bit platforms tend to use extended precision accumulators and the x87 instruction set. The 64 bit platforms tend to use sse2+. Different precisions, even though you might think they are the same. just to confirm, same computer as before but the python 3.2 version is 64 bit, now I get the Linux result Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)] on win32 import numpy as np np.__version__ '1.5.1' a = 4000*np.ones((1024,1024),dtype=np.float32) a.mean() 4034.16357421875 a.mean(0).mean(0) 4000.0 a.mean(dtype=np.float64) 4000.0 Josef snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion