Re: [Numpy-discussion] Bug in numpy.mean() revisited

2012-07-27 Thread Henry Gomersall
On Thu, 2012-07-26 at 22:15 -0600, Charles R Harris wrote:
 I would support accumulating in 64 bits but, IIRC, the function will
 need to be rewritten so that it works by adding 32 bit floats to the
 accumulator to save space. There are also more stable methods that
 could also be investigated. There is a nice little project there for
 someone to cut their teeth on.

So a (very) quick read around suggests that using an interim mean gives
a more robust algorithm. The problem being, that these techniques are
either multi-pass, or inherently slower (due to say a division in the
loop).

Higher precision would not suffer the same potential slow down and would
solve most cases of this problem.

Henry

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug in numpy.mean() revisited

2012-07-27 Thread Nathaniel Smith
On Fri, Jul 27, 2012 at 5:15 AM, Charles R Harris
charlesr.har...@gmail.com wrote:
 I would support accumulating in 64 bits but, IIRC, the function will need to
 be rewritten so that it works by adding 32 bit floats to the accumulator to
 save space. There are also more stable methods that could also be
 investigated. There is a nice little project there for someone to cut their
 teeth on.

So the obvious solution here would be to make the ufunc reduce loop
smart enough that
  x = np.zeros(2 ** 30, dtype=float32)
  np.sum(x, dtype=float64)
does not upcast 'x' to float64's as a whole. This shouldn't be too
terrible to implement -- iterate over the float32 array, and only
upcast each inner-loop buffer as you go, instead of upcasting the
whole thing.

In fact, nditer might do this already?

Then using a wide accumulator by default would just take a few lines
of code in numpy.core._methods._mean to select the proper dtype and
downcast the result.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Bug in numpy.mean() revisited

2012-07-26 Thread Tom Aldcroft
There was a thread in January discussing the non-obvious behavior of
numpy.mean() for large arrays of float32 values [1].  This issue is
nicely discussed at the end of the numpy.mean() documentation [2] with
an example:

 a = np.zeros((2, 512*512), dtype=np.float32)
 a[0, :] = 1.0
 a[1, :] = 0.1
 np.mean(a)
0.546875

From the docs and previous discussion it seems there is no technical
difficulty in choosing a different (higher precision) type for the
accumulator using the dtype arg, and in fact this is done
automatically for int values.

My question is whether there would be any support for doing something
more than documenting this behavior.  I suspect very few people ever
make it below the fold for the np.mean() documentation.  Taking the
mean of large arrays of float32 values is a *very* common use case and
giving the wrong answer with default inputs is really disturbing.  I
recently had to rebuild a complex science data archive because of
corrupted mean values.

Possible ideas to stimulate discussion:
1. Always use float64 to accumulate float types that are 64 bits or
less.   Are there serious performance impacts to automatically using
float64 to accumulate float32 arrays?  I appreciate this would likely
introduce unwanted regressions (sometimes suddenly getting the right
answer is a bad thing).  So could this be considered for numpy 2.0?

2. Might there be a way to emit a warning if the number of values and
the max accumulated value [3] are such that the estimated fractional
error is above some tolerance?  I'm not even sure if this is a good
idea or if there will be howls from the community as their codes start
warning about inaccurate mean values.  Better idea along this line??

Cheers,
Tom

[1]: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059960.html
[2]: http://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html
[3]: Using the max accumulated value during accumulation instead of
the final accumulated value seems like the right thing for estimating
precision loss.  But this would affect performance so maybe just using
the final value would catch many cases.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug in numpy.mean() revisited

2012-07-26 Thread Charles R Harris
On Thu, Jul 26, 2012 at 9:26 PM, Tom Aldcroft aldcr...@head.cfa.harvard.edu
 wrote:

 There was a thread in January discussing the non-obvious behavior of
 numpy.mean() for large arrays of float32 values [1].  This issue is
 nicely discussed at the end of the numpy.mean() documentation [2] with
 an example:

  a = np.zeros((2, 512*512), dtype=np.float32)
  a[0, :] = 1.0
  a[1, :] = 0.1
  np.mean(a)
 0.546875

 From the docs and previous discussion it seems there is no technical
 difficulty in choosing a different (higher precision) type for the
 accumulator using the dtype arg, and in fact this is done
 automatically for int values.

 My question is whether there would be any support for doing something
 more than documenting this behavior.  I suspect very few people ever
 make it below the fold for the np.mean() documentation.  Taking the
 mean of large arrays of float32 values is a *very* common use case and
 giving the wrong answer with default inputs is really disturbing.  I
 recently had to rebuild a complex science data archive because of
 corrupted mean values.

 Possible ideas to stimulate discussion:
 1. Always use float64 to accumulate float types that are 64 bits or
 less.   Are there serious performance impacts to automatically using
 float64 to accumulate float32 arrays?  I appreciate this would likely
 introduce unwanted regressions (sometimes suddenly getting the right
 answer is a bad thing).  So could this be considered for numpy 2.0?

 2. Might there be a way to emit a warning if the number of values and
 the max accumulated value [3] are such that the estimated fractional
 error is above some tolerance?  I'm not even sure if this is a good
 idea or if there will be howls from the community as their codes start
 warning about inaccurate mean values.  Better idea along this line??


I would support accumulating in 64 bits but, IIRC, the function will need
to be rewritten so that it works by adding 32 bit floats to the accumulator
to save space. There are also more stable methods that could also be
investigated. There is a nice little project there for someone to cut their
teeth on.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread K . -Michael Aye
I know I know, that's pretty outrageous to even suggest, but please 
bear with me, I am stumped as you may be:

2-D data file here:
http://dl.dropbox.com/u/139035/data.npy

Then:
In [3]: data.mean()
Out[3]: 3067.024383998

In [4]: data.max()
Out[4]: 3052.4343

In [5]: data.shape
Out[5]: (1000, 1000)

In [6]: data.min()
Out[6]: 3040.498

In [7]: data.dtype
Out[7]: dtype('float32')


A mean value calculated per loop over the data gives me 3045.747251076416
I first thought I still misunderstand how data.mean() works, per axis 
and so on, but did the same with a flattenend version with the same 
results.

Am I really soo tired that I can't see what I am doing wrong here?
For completion, the data was read by a osgeo.gdal dataset method called 
ReadAsArray()
My numpy.__version__ gives me 1.6.1 and my whole setup is based on 
Enthought's EPD.

Best regards,
Michael



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Bruce Southey
On 01/24/2012 12:33 PM, K.-Michael Aye wrote:
 I know I know, that's pretty outrageous to even suggest, but please
 bear with me, I am stumped as you may be:

 2-D data file here:
 http://dl.dropbox.com/u/139035/data.npy

 Then:
 In [3]: data.mean()
 Out[3]: 3067.024383998

 In [4]: data.max()
 Out[4]: 3052.4343

 In [5]: data.shape
 Out[5]: (1000, 1000)

 In [6]: data.min()
 Out[6]: 3040.498

 In [7]: data.dtype
 Out[7]: dtype('float32')


 A mean value calculated per loop over the data gives me 3045.747251076416
 I first thought I still misunderstand how data.mean() works, per axis
 and so on, but did the same with a flattenend version with the same
 results.

 Am I really soo tired that I can't see what I am doing wrong here?
 For completion, the data was read by a osgeo.gdal dataset method called
 ReadAsArray()
 My numpy.__version__ gives me 1.6.1 and my whole setup is based on
 Enthought's EPD.

 Best regards,
 Michael



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
You have a million 32-bit floating point numbers that are in the 
thousands. Thus you are exceeding the 32-bitfloat precision and, if you 
can, you need to increase precision of the accumulator in np.mean() or 
change the input dtype:
  a.mean(dtype=np.float32) # default and lacks precision
3067.024383998
  a.mean(dtype=np.float64)
3045.747251076416
  a.mean(dtype=np.float128)
3045.7472510764160156
  b=a.astype(np.float128)
  b.mean()
3045.7472510764160156

Otherwise you are left to using some alternative approach to calculate 
the mean.

Bruce





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Kathleen M Tacina
I have confirmed this on a 64-bit linux machine running python 2.7.2
with the development version of numpy.  It seems to be related to using
float32 instead of float64.   If the array is first converted to a
64-bit float (via astype), mean gives an answer that agrees with your
looped-calculation value: 3045.747250002.  With the original 32-bit
array, averaging successively on one axis and then on the other gives
answers that agree with the 64-bit float answer to the second decimal
place.


In [125]: d = np.load('data.npy')

In [126]: d.mean()
Out[126]: 3067.024383998

In [127]: d64 = d.astype('float64')

In [128]: d64.mean()
Out[128]: 3045.747251076416

In [129]: d.mean(axis=0).mean()
Out[129]: 3045.748750002

In [130]: d.mean(axis=1).mean()
Out[130]: 3045.74448

In [131]: np.version.full_version
Out[131]: '2.0.0.dev-55472ca'



--
On Tue, 2012-01-24 at 12:33 -0600, K.-MichaelA wrote:

 I know I know, that's pretty outrageous to even suggest, but please 
 bear with me, I am stumped as you may be:
 
 2-D data file here:
 http://dl.dropbox.com/u/139035/data.npy
 
 Then:
 In [3]: data.mean()
 Out[3]: 3067.024383998
 
 In [4]: data.max()
 Out[4]: 3052.4343
 
 In [5]: data.shape
 Out[5]: (1000, 1000)
 
 In [6]: data.min()
 Out[6]: 3040.498
 
 In [7]: data.dtype
 Out[7]: dtype('float32')
 
 
 A mean value calculated per loop over the data gives me 3045.747251076416
 I first thought I still misunderstand how data.mean() works, per axis 
 and so on, but did the same with a flattenend version with the same 
 results.
 
 Am I really soo tired that I can't see what I am doing wrong here?
 For completion, the data was read by a osgeo.gdal dataset method called 
 ReadAsArray()
 My numpy.__version__ gives me 1.6.1 and my whole setup is based on 
 Enthought's EPD.
 
 Best regards,
 Michael
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- 
--
Kathleen M. Tacina
NASA Glenn Research Center
MS 5-10
21000 Brookpark Road
Cleveland, OH 44135
Telephone: (216) 433-6660
Fax: (216) 433-5802
--
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Zachary Pincus

On Jan 24, 2012, at 1:33 PM, K.-Michael Aye wrote:

 I know I know, that's pretty outrageous to even suggest, but please 
 bear with me, I am stumped as you may be:
 
 2-D data file here:
 http://dl.dropbox.com/u/139035/data.npy
 
 Then:
 In [3]: data.mean()
 Out[3]: 3067.024383998
 
 In [4]: data.max()
 Out[4]: 3052.4343
 
 In [5]: data.shape
 Out[5]: (1000, 1000)
 
 In [6]: data.min()
 Out[6]: 3040.498
 
 In [7]: data.dtype
 Out[7]: dtype('float32')
 
 
 A mean value calculated per loop over the data gives me 3045.747251076416
 I first thought I still misunderstand how data.mean() works, per axis 
 and so on, but did the same with a flattenend version with the same 
 results.
 
 Am I really soo tired that I can't see what I am doing wrong here?
 For completion, the data was read by a osgeo.gdal dataset method called 
 ReadAsArray()
 My numpy.__version__ gives me 1.6.1 and my whole setup is based on 
 Enthought's EPD.


I get the same result:

In [1]: import numpy

In [2]: data = numpy.load('data.npy')

In [3]: data.mean()
Out[3]: 3067.024383998

In [4]: data.max()
Out[4]: 3052.4343

In [5]: data.min()
Out[5]: 3040.498

In [6]: numpy.version.version
Out[6]: '2.0.0.dev-433b02a'

This on OS X 10.7.2 with Python 2.7.1, on an intel Core i7. Running python as a 
32 vs. 64-bit process doesn't make a difference.

The data matrix doesn't look too strange when I view it as an image -- all 
pretty smooth variation around the (min, max) range. But maybe it's still 
somehow floating-point pathological?

This is fun too:
In [12]: data.mean()
Out[12]: 3067.024383998

In [13]: (data/3000).mean()*3000
Out[13]: 3020.807437501

In [15]: (data/2).mean()*2
Out[15]: 3067.024383998

In [16]: (data/200).mean()*200
Out[16]: 3013.67541


Zach


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Val Kalatsky
Just what Bruce said.

You can run the following to confirm:
np.mean(data - data.mean())

If for some reason you do not want to convert to float64 you can add the
result of the previous line to the bad mean:
bad_mean = data.mean()
good_mean = bad_mean + np.mean(data - bad_mean)

Val

On Tue, Jan 24, 2012 at 12:33 PM, K.-Michael Aye kmichael@gmail.comwrote:

 I know I know, that's pretty outrageous to even suggest, but please
 bear with me, I am stumped as you may be:

 2-D data file here:
 http://dl.dropbox.com/u/139035/data.npy

 Then:
 In [3]: data.mean()
 Out[3]: 3067.024383998

 In [4]: data.max()
 Out[4]: 3052.4343

 In [5]: data.shape
 Out[5]: (1000, 1000)

 In [6]: data.min()
 Out[6]: 3040.498

 In [7]: data.dtype
 Out[7]: dtype('float32')


 A mean value calculated per loop over the data gives me 3045.747251076416
 I first thought I still misunderstand how data.mean() works, per axis
 and so on, but did the same with a flattenend version with the same
 results.

 Am I really soo tired that I can't see what I am doing wrong here?
 For completion, the data was read by a osgeo.gdal dataset method called
 ReadAsArray()
 My numpy.__version__ gives me 1.6.1 and my whole setup is based on
 Enthought's EPD.

 Best regards,
 Michael



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Zachary Pincus
 You have a million 32-bit floating point numbers that are in the 
 thousands. Thus you are exceeding the 32-bitfloat precision and, if you 
 can, you need to increase precision of the accumulator in np.mean() or 
 change the input dtype:
 a.mean(dtype=np.float32) # default and lacks precision
 3067.024383998
 a.mean(dtype=np.float64)
 3045.747251076416
 a.mean(dtype=np.float128)
 3045.7472510764160156
 b=a.astype(np.float128)
 b.mean()
 3045.7472510764160156
 
 Otherwise you are left to using some alternative approach to calculate 
 the mean.
 
 Bruce

Interesting -- I knew that float64 accumulators were used with integer arrays, 
and I had just assumed that 64-bit or higher accumulators would be used with 
floating-point arrays too, instead of the array's dtype. This is actually quite 
a bit of a gotcha for floating-point imaging-type tasks -- good to know!

Zach
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread K . -Michael Aye
Thank you Bruce and all, 
I knew I was doing something wrong (should have read the mean method 
doc more closely). Am of course glad that's so easy understandable.
But: If the error can get so big, wouldn't it be a better idea for the 
accumulator to always be of type 'float64' and then convert later to 
the type of the original array? 
As one can see in this case, the result would be much closer to the true value.


Michael


On 2012-01-24 19:01:40 +, Val Kalatsky said:



Just what Bruce said. 

You can run the following to confirm:
np.mean(data - data.mean())

If for some reason you do not want to convert to float64 you can add 
the result of the previous line to the bad mean:

bad_mean = data.mean()
good_mean = bad_mean + np.mean(data - bad_mean)

Val

On Tue, Jan 24, 2012 at 12:33 PM, K.-Michael Aye 
kmichael@gmail.com wrote:

I know I know, that's pretty outrageous to even suggest, but please
bear with me, I am stumped as you may be:

2-D data file here:
http://dl.dropbox.com/u/139035/data.npy

Then:
In [3]: data.mean()
Out[3]: 3067.024383998

In [4]: data.max()
Out[4]: 3052.4343

In [5]: data.shape
Out[5]: (1000, 1000)

In [6]: data.min()
Out[6]: 3040.498

In [7]: data.dtype
Out[7]: dtype('float32')


A mean value calculated per loop over the data gives me 3045.747251076416
I first thought I still misunderstand how data.mean() works, per axis
and so on, but did the same with a flattenend version with the same
results.

Am I really soo tired that I can't see what I am doing wrong here?
For completion, the data was read by a osgeo.gdal dataset method called
ReadAsArray()
My numpy.__version__ gives me 1.6.1 and my whole setup is based on
Enthought's EPD.

Best regards,
Michael



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread eat
Hi,

Oddly, but numpy 1.6 seems to behave more consistent manner:

In []: sys.version
Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'
In []: np.version.version
Out[]: '1.6.0'

In []: d= np.load('data.npy')
In []: d.dtype
Out[]: dtype('float32')

In []: d.mean()
Out[]: 3045.74718
In []: d.mean(dtype= np.float32)
Out[]: 3045.74718
In []: d.mean(dtype= np.float64)
Out[]: 3045.747251076416
In []: (d- d.min()).mean()+ d.min()
Out[]: 3045.7472508750002
In []: d.mean(axis= 0).mean()
Out[]: 3045.74724
In []: d.mean(axis= 1).mean()
Out[]: 3045.74724

Or does the results of calculations depend more on the platform?


My 2 cents,
eat
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Kathleen M Tacina
I found something similar, with a very simple example.

On 64-bit linux, python 2.7.2, numpy development version:

In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32)

In [23]: a.mean()
Out[23]: 4034.16357421875

In [24]: np.version.full_version
Out[24]: '2.0.0.dev-55472ca'


But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives:
a = np.ones((1024,1024),dtype=np.float32)
a.mean()
4000.0
np.version.full_version
'1.6.1'


On Tue, 2012-01-24 at 17:12 -0600, eat wrote:

 Hi,
 
 
 
 Oddly, but numpy 1.6 seems to behave more consistent manner:
 
 
 In []: sys.version
 Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
 (Intel)]'
 In []: np.version.version
 Out[]: '1.6.0'
 
 
 In []: d= np.load('data.npy')
 In []: d.dtype
 Out[]: dtype('float32')
 
 
 In []: d.mean()
 Out[]: 3045.74718
 In []: d.mean(dtype= np.float32)
 Out[]: 3045.74718
 In []: d.mean(dtype= np.float64)
 Out[]: 3045.747251076416
 In []: (d- d.min()).mean()+ d.min()
 Out[]: 3045.7472508750002
 In []: d.mean(axis= 0).mean()
 Out[]: 3045.74724
 In []: d.mean(axis= 1).mean()
 Out[]: 3045.74724
 
 
 Or does the results of calculations depend more on the platform?
 
 
 
 
 My 2 cents,
 eat

-- 
--
Kathleen M. Tacina
NASA Glenn Research Center
MS 5-10
21000 Brookpark Road
Cleveland, OH 44135
Telephone: (216) 433-6660
Fax: (216) 433-5802
--
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread eat
Hi

On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina 
kathleen.m.tac...@nasa.gov wrote:

 **
 I found something similar, with a very simple example.

 On 64-bit linux, python 2.7.2, numpy development version:

 In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32)

 In [23]: a.mean()
 Out[23]: 4034.16357421875

 In [24]: np.version.full_version
 Out[24]: '2.0.0.dev-55472ca'


 But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives:
 a = np.ones((1024,1024),dtype=np.float32)
 a.mean()
 4000.0
 np.version.full_version
 '1.6.1'

This indeed looks very nasty, regardless of whether it is a version or
platform related problem.

-eat




 On Tue, 2012-01-24 at 17:12 -0600, eat wrote:

 Hi,



  Oddly, but numpy 1.6 seems to behave more consistent manner:



  In []: sys.version

  Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
 (Intel)]'

  In []: np.version.version

  Out[]: '1.6.0'



  In []: d= np.load('data.npy')

  In []: d.dtype

  Out[]: dtype('float32')



  In []: d.mean()

  Out[]: 3045.74718

  In []: d.mean(dtype= np.float32)

  Out[]: 3045.74718

  In []: d.mean(dtype= np.float64)

  Out[]: 3045.747251076416

  In []: (d- d.min()).mean()+ d.min()

  Out[]: 3045.7472508750002

  In []: d.mean(axis= 0).mean()

  Out[]: 3045.74724

  In []: d.mean(axis= 1).mean()

  Out[]: 3045.74724



  Or does the results of calculations depend more on the platform?





  My 2 cents,

  eat

   --
 --
 Kathleen M. Tacina
 NASA Glenn Research Center
 MS 5-10
 21000 Brookpark Road
 Cleveland, OH 44135
 Telephone: (216) 433-6660
 Fax: (216) 433-5802
 --


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread josef . pktd
On Tue, Jan 24, 2012 at 7:21 PM, eat e.antero.ta...@gmail.com wrote:

 Hi

 On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina 
 kathleen.m.tac...@nasa.gov wrote:

 **
 I found something similar, with a very simple example.

 On 64-bit linux, python 2.7.2, numpy development version:

 In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32)

 In [23]: a.mean()
 Out[23]: 4034.16357421875

 In [24]: np.version.full_version
 Out[24]: '2.0.0.dev-55472ca'


 But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives:
 a = np.ones((1024,1024),dtype=np.float32)
 a.mean()
 4000.0
 np.version.full_version
 '1.6.1'

 This indeed looks very nasty, regardless of whether it is a version or
 platform related problem.


Looks like platform specific, same result as -eat

Windows 7,
Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit
(Intel)] on win32

 a = np.ones((1024,1024),dtype=np.float32)
 a.mean()
1.0

 (4000*a).dtype
dtype('float32')
 (4000*a).mean()
4000.0

 b = np.load(data.npy)
 b.mean()
3045.74718
 b.shape
(1000, 1000)
 b.mean(0).mean(0)
3045.74724
 _.dtype
dtype('float64')
 b.dtype
dtype('float32')

 b.mean(dtype=np.float32)
3045.74718

Josef



 -eat




 On Tue, 2012-01-24 at 17:12 -0600, eat wrote:

 Hi,



  Oddly, but numpy 1.6 seems to behave more consistent manner:



  In []: sys.version

  Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
 (Intel)]'

  In []: np.version.version

  Out[]: '1.6.0'



  In []: d= np.load('data.npy')

  In []: d.dtype

  Out[]: dtype('float32')



  In []: d.mean()

  Out[]: 3045.74718

  In []: d.mean(dtype= np.float32)

  Out[]: 3045.74718

  In []: d.mean(dtype= np.float64)

  Out[]: 3045.747251076416

  In []: (d- d.min()).mean()+ d.min()

  Out[]: 3045.7472508750002

  In []: d.mean(axis= 0).mean()

  Out[]: 3045.74724

  In []: d.mean(axis= 1).mean()

  Out[]: 3045.74724



  Or does the results of calculations depend more on the platform?





  My 2 cents,

  eat

   --
 --
 Kathleen M. Tacina
 NASA Glenn Research Center
 MS 5-10
 21000 Brookpark Road
 Cleveland, OH 44135
 Telephone: (216) 433-6660
 Fax: (216) 433-5802
 --


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Charles R Harris
On Tue, Jan 24, 2012 at 4:21 PM, Kathleen M Tacina 
kathleen.m.tac...@nasa.gov wrote:

 **
 I found something similar, with a very simple example.

 On 64-bit linux, python 2.7.2, numpy development version:

 In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32)

 In [23]: a.mean()
 Out[23]: 4034.16357421875

 In [24]: np.version.full_version
 Out[24]: '2.0.0.dev-55472ca'


 But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives:
 a = np.ones((1024,1024),dtype=np.float32)
 a.mean()
 4000.0
 np.version.full_version
 '1.6.1'



Yes, the results are platform/compiler dependent. The 32 bit platforms tend
to use extended precision accumulators and the x87 instruction set. The 64
bit platforms tend to use sse2+. Different precisions, even though you
might think they are the same.

snip

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread josef . pktd
On Wed, Jan 25, 2012 at 12:03 AM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Tue, Jan 24, 2012 at 4:21 PM, Kathleen M Tacina
 kathleen.m.tac...@nasa.gov wrote:

 I found something similar, with a very simple example.

 On 64-bit linux, python 2.7.2, numpy development version:

 In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32)

 In [23]: a.mean()
 Out[23]: 4034.16357421875

 In [24]: np.version.full_version
 Out[24]: '2.0.0.dev-55472ca'


 But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives:
 a = np.ones((1024,1024),dtype=np.float32)
 a.mean()
 4000.0
 np.version.full_version
 '1.6.1'



 Yes, the results are platform/compiler dependent. The 32 bit platforms tend
 to use extended precision accumulators and the x87 instruction set. The 64
 bit platforms tend to use sse2+. Different precisions, even though you might
 think they are the same.

just to confirm, same computer as before but the python 3.2 version is
64 bit, now I get the Linux result

Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit
(AMD64)] on win32

 import numpy as np
 np.__version__
'1.5.1'
 a = 4000*np.ones((1024,1024),dtype=np.float32)
 a.mean()
4034.16357421875
 a.mean(0).mean(0)
4000.0
 a.mean(dtype=np.float64)
4000.0

Josef


 snip

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion