Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-24 Thread Eelco Hoogendoorn
Perhaps it is a slightly semantical discussion; but all fp calculations have errors, and there are always strategies for making them smaller. We just don't happen to like the error for this case; but rest assured it won't be hard to find new cases of 'blatantly wrong' results, no matter what acc

Re: [Numpy-discussion] numpy.mean still broken for large float32arrays

2014-07-24 Thread Alan G Isaac
On 7/24/2014 4:42 PM, Eelco Hoogendoorn wrote: > This isn't a bug report, but rather a feature request. I'm not sure statement this is correct. The mean of a float32 array can certainly be computed as a float32. Currently this is not necessarily what happens, not even approximately. That feels a

Re: [Numpy-discussion] numpy.mean still broken for large float32arrays

2014-07-24 Thread Eelco Hoogendoorn
Inaccurate and utterly wrong are subjective. If You want To Be sufficiently strict, floating point calculations are almost always 'utterly wrong'. Granted, It would Be Nice if the docs specified the algorithm used. But numpy does not produce anything different than what a standard c loop or c++

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Jeff Reback
related recent issue: https://github.com/numpy/numpy/issues/4638 and pandas is now explicitly specifying the accumulator to avoid this problem: https://github.com/pydata/pandas/pull/6954/files pandas also implemented the Welfords method for rolling_var in 0.14.0, see here: https://github.com/pydat

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread RayS
Probably a number of scipy places as well import numpy import scipy.stats print numpy.__version__ print scipy.__version__ for s in range(16777214, 16777944): if scipy.stats.nanmean(numpy.ones((s, 1), numpy.float32))[0]!=1: print '\nbroke', s, scipy.stats.nanmean(numpy.ones((s, 1),

[Numpy-discussion] masked_where broadcasting?

2014-07-24 Thread Benjamin Root
I ran into this this morning while writing up a new test for matplotlib. Shouldn't these two arrays be broadcasted automatically or maybe np.ma is being overly cautious? u = np.ma.masked_where((-0.4 < x) & (x < 0.1), u, copy=False) File "/home/ben/.local/lib/python2.7/site-packages/numpy/ma/

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread RayS
import numpy print numpy.__version__ for s in range(1864100, 1864200): if numpy.ones((s, 9), numpy.float32).sum()!= s*9: print '\nbroke', s break else: print '\r',s, C:\temp>python np_sum.py 1.8.0b2 1864135 broke 1864136 import numpy print numpy.__version__ for s

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Frédéric Bastien
On Thu, Jul 24, 2014 at 12:59 PM, Charles R Harris < charlesr.har...@gmail.com> wrote: > > > > On Thu, Jul 24, 2014 at 8:27 AM, Jaime Fernández del Río < > jaime.f...@gmail.com> wrote: > >> On Thu, Jul 24, 2014 at 4:56 AM, Julian Taylor < >> jtaylor.deb...@googlemail.com> wrote: >> >>> In practice

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Joseph Martinot-Lagarde
Le 24/07/2014 12:55, Thomas Unterthiner a écrit : > I don't agree. The problem is that I expect `mean` to do something > reasonable. The documentation mentions that the results can be > "inaccurate", which is a huge understatement: the results can be utterly > wrong. That is not reasonable. At the

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Charles R Harris
On Thu, Jul 24, 2014 at 8:27 AM, Jaime Fernández del Río < jaime.f...@gmail.com> wrote: > On Thu, Jul 24, 2014 at 4:56 AM, Julian Taylor < > jtaylor.deb...@googlemail.com> wrote: > >> In practice one of the better methods is pairwise summation that is >> pretty much as fast as a naive summation b

Re: [Numpy-discussion] numpy.mean still broken for large float32arrays

2014-07-24 Thread Eelco Hoogendoorn
True, i suppose there is no harm in accumulating with max precision, and storing the result in the Original dtype, unless otherwise specified, although i wonder if the current nditer supports such behavior. -Original Message- From: "Alan G Isaac" Sent: ‎24-‎7-‎2014 18:09 To: "Discussion

Re: [Numpy-discussion] numpy.mean still broken for large float32arrays

2014-07-24 Thread Eelco Hoogendoorn
Thanks Julian, those seem like Nice improvements. The fact that it either does or doesnt work depending on the axis makes me a Little queesy; but yeah, checking that fp's do what You think they should, is unfortunately best left as the responsibility of the programmer. -Original Message

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Alan G Isaac
On 7/24/2014 5:59 AM, Eelco Hoogendoorn wrote to Thomas: > np.mean isn't broken; your understanding of floating point number is. This comment seems to conflate separate issues: the desirable return type, and the computational algorithm. It is certainly possible to compute a mean of float32 doing

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Jaime Fernández del Río
On Thu, Jul 24, 2014 at 4:56 AM, Julian Taylor < jtaylor.deb...@googlemail.com> wrote: > In practice one of the better methods is pairwise summation that is > pretty much as fast as a naive summation but has an accuracy of > O(logN) ulp. > This is the method numpy 1.9 will use this method by defau

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Julian Taylor
On Thu, Jul 24, 2014 at 1:33 PM, Fabien wrote: > Hi all, > > On 24.07.2014 11:59, Eelco Hoogendoorn wrote: >> np.mean isn't broken; your understanding of floating point number is. > > I am quite new to python, and this problem is discussed over and over > for other languages too. However, numpy's

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Fabien
Hi all, On 24.07.2014 11:59, Eelco Hoogendoorn wrote: > np.mean isn't broken; your understanding of floating point number is. I am quite new to python, and this problem is discussed over and over for other languages too. However, numpy's summation problem appears with relatively small arrays al

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Thomas Unterthiner
I don't agree. The problem is that I expect `mean` to do something reasonable. The documentation mentions that the results can be "inaccurate", which is a huge understatement: the results can be utterly wrong. That is not reasonable. At the very least, a warning should be issued in cases where

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Eelco Hoogendoorn
Arguably, this isn't a problem of numpy, but of programmers being trained to think of floating point numbers as 'real' numbers, rather than just a finite number of states with a funny distribution over the number line. np.mean isn't broken; your understanding of floating point number is. What you

Re: [Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-24 Thread Robert Kern
On Thu, Jul 24, 2014 at 10:39 AM, Lars Buitinck wrote: > Wed, 23 Jul 2014 22:13:33 +0100 Nathaniel Smith : >> On Wed, Jul 23, 2014 at 9:57 PM, Robert Kern wrote: >>> That's perhaps what you want, but numpy has never claimed to do this. > > ... except in np.where, which promises to return indices

Re: [Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-24 Thread Lars Buitinck
Wed, 23 Jul 2014 22:13:33 +0100 Nathaniel Smith : > On Wed, Jul 23, 2014 at 9:57 PM, Robert Kern wrote: >> That's perhaps what you want, but numpy has never claimed to do this. ... except in np.where, which promises to return indices but actually returns arrays of longs and thus doesn't work wit

[Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Thomas Unterthiner
Hi! The following is a known "bug" since at least 2010 [1]: import numpy as np X = np.ones((5, 1024), np.float32) print X.mean() >>> 0.32768 I ran into this for the first time today as part of a larger program. I was very surprised by this, and spent over an hour lookin

Re: [Numpy-discussion] change default integer from int32 to int64 on win64?

2014-07-24 Thread Robert Kern
On Thu, Jul 24, 2014 at 3:47 AM, Sturla Molden wrote: > Julian Taylor wrote: > >> The default integer dtype should be sufficiently large to index into any >> numpy array, thats what I call an API here. win64 behaves different, you >> have to explicitly upcast your index to be able to index all me