Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Julian Taylor
On 28.07.2014 23:32, Eelco Hoogendoorn wrote: > I see, thanks for the clarification. Just for the sake of argument, > since unfortunately I don't have the time to go dig in the guts of numpy > myself: a design which always produces results of the same (high) > accuracy, but only optimizes the commo

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Eelco Hoogendoorn
I see, thanks for the clarification. Just for the sake of argument, since unfortunately I don't have the time to go dig in the guts of numpy myself: a design which always produces results of the same (high) accuracy, but only optimizes the common access patterns in a hacky way, and may be inefficie

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Sebastian Berg
On Mo, 2014-07-28 at 16:31 +0200, Eelco Hoogendoorn wrote: > Sebastian: > > > Those are good points. Indeed iteration order may already produce > different results, even though the semantics of numpy suggest > identical operations. Still, I feel this different behavior without > any semantical cl

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Eelco Hoogendoorn
Sebastian: Those are good points. Indeed iteration order may already produce different results, even though the semantics of numpy suggest identical operations. Still, I feel this different behavior without any semantical clues is something to be minimized. Indeed copying might have large speed i

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Sebastian Berg
On Mo, 2014-07-28 at 15:50 +0200, Fabien wrote: > On 28.07.2014 15:30, Daπid wrote: > > An example using float16 on Numpy 1.8.1 (I haven't seen diferences with > > float32): > > Why aren't there differences between float16 and float32 ? > float16 calculations are actually float32 calculations. I

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Sebastian Berg
On Mo, 2014-07-28 at 15:35 +0200, Sturla Molden wrote: > On 28/07/14 15:21, alex wrote: > > > Are you sure they always give different results? Notice that > > np.ones((N,2)).mean(0) > > np.ones((2,N)).mean(1) > > compute means of different axes on transposed arrays so these > > differences 'cance

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Fabien
On 28.07.2014 15:30, Daπid wrote: > An example using float16 on Numpy 1.8.1 (I haven't seen diferences with > float32): Why aren't there differences between float16 and float32 ? Could this be related to my earlier post in this thread where I mentioned summation problems occurring much earlier i

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Sturla Molden
On 28/07/14 15:21, alex wrote: > Are you sure they always give different results? Notice that > np.ones((N,2)).mean(0) > np.ones((2,N)).mean(1) > compute means of different axes on transposed arrays so these > differences 'cancel out'. They will be if different algorithms are used. np.ones((N,2)

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Daπid
On 28 July 2014 14:46, Sebastian Berg wrote: > > To rephrase my most pressing question: may np.ones((N,2)).mean(0) and > > np.ones((2,N)).mean(1) produce different results with the > > implementation in the current master? If so, I think that would be > > very much regrettable; and if this is a m

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread alex
On Mon, Jul 28, 2014 at 8:46 AM, Sebastian Berg wrote: > On Mo, 2014-07-28 at 14:37 +0200, Eelco Hoogendoorn wrote: >> To rephrase my most pressing question: may np.ones((N,2)).mean(0) and >> np.ones((2,N)).mean(1) produce different results with the >> implementation in the current master? If so,

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Sebastian Berg
On Mo, 2014-07-28 at 14:37 +0200, Eelco Hoogendoorn wrote: > To rephrase my most pressing question: may np.ones((N,2)).mean(0) and > np.ones((2,N)).mean(1) produce different results with the > implementation in the current master? If so, I think that would be > very much regrettable; and if this is

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Eelco Hoogendoorn
To rephrase my most pressing question: may np.ones((N,2)).mean(0) and np.ones((2,N)).mean(1) produce different results with the implementation in the current master? If so, I think that would be very much regrettable; and if this is a minority opinion, I do hope that at least this gets documented i

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-27 Thread Sturla Molden
Nathaniel Smith wrote: > The problem here is that when summing up the values, the sum gets > large enough that after rounding, x + 1 = x and the sum stops > increasing. Interesting. That explains why the divide-and-conquer reduction is much more robust. Thanks :) Sturla _

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-27 Thread RayS
Thanks for the clarification, but how is the numpy rounding directed? Round to nearest, ties to even? http://en.wikipedia.org/wiki/IEEE_floating_point#Rounding_rules Just curious, as I couldn't find a reference. - Ray At 07:44 AM 7/27/2014, you wrote: >On Sun, Jul 27, 2014 at 3:16 PM, RayS wro

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-27 Thread Nathaniel Smith
On Sun, Jul 27, 2014 at 3:16 PM, RayS wrote: > At 02:04 AM 7/27/2014, you wrote: > >>You won't be able to do it by accident or omission or a lack of >>discipline. It's not a tempting public target like, say, np.seterr(). > > BTW, why not throw an overflow error in the large float32 sum() case? > I

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-27 Thread RayS
At 02:04 AM 7/27/2014, you wrote: >You won't be able to do it by accident or omission or a lack of >discipline. It's not a tempting public target like, say, np.seterr(). BTW, why not throw an overflow error in the large float32 sum() case? Is it too expensive to check while accumulating? - Ray

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-27 Thread Robert Kern
On Sun, Jul 27, 2014 at 9:56 AM, wrote: > > On Sun, Jul 27, 2014 at 4:24 AM, Robert Kern wrote: >> >> On Sun, Jul 27, 2014 at 7:04 AM, wrote: >> > >> > On Sat, Jul 26, 2014 at 5:19 PM, Sturla Molden >> > wrote: >> >> >> >> Robert Kern wrote: >> >> >> >> >> It would presumably require a globa

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-27 Thread josef.pktd
On Sun, Jul 27, 2014 at 4:24 AM, Robert Kern wrote: > On Sun, Jul 27, 2014 at 7:04 AM, wrote: > > > > On Sat, Jul 26, 2014 at 5:19 PM, Sturla Molden > > wrote: > >> > >> Robert Kern wrote: > >> > >> >> It would presumably require a global threading.RLock for protecting > the > >> >> global st

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-27 Thread Robert Kern
On Sun, Jul 27, 2014 at 7:04 AM, wrote: > > On Sat, Jul 26, 2014 at 5:19 PM, Sturla Molden > wrote: >> >> Robert Kern wrote: >> >> >> It would presumably require a global threading.RLock for protecting the >> >> global state. >> > >> > We would use thread-local storage like we currently do with

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread josef.pktd
On Sat, Jul 26, 2014 at 5:19 PM, Sturla Molden wrote: > Robert Kern wrote: > > >> It would presumably require a global threading.RLock for protecting the > >> global state. > > > > We would use thread-local storage like we currently do with the > > np.errstate() context manager. Each thread will

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Sturla Molden
Robert Kern wrote: >> It would presumably require a global threading.RLock for protecting the >> global state. > > We would use thread-local storage like we currently do with the > np.errstate() context manager. Each thread will have its own "global" > state. That sounds like a better plan, yes

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Robert Kern
On Sat, Jul 26, 2014 at 8:04 PM, Sturla Molden wrote: > Benjamin Root wrote: > >> My other concern would be with multi-threaded code (which is where a global >> state would be bad). > > It would presumably require a global threading.RLock for protecting the > global state. We would use thread-lo

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Sylvain Corlay
I completely agree with Eelco. I expect numpy.mean to do something simple and straightforward. If the naive method is not well suited for my data, I can deal with it and have my own ad hoc method. On Sat, Jul 26, 2014 at 3:19 PM, Eelco Hoogendoorn wrote: > Perhaps I in turn am missing something;

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Sturla Molden
wrote: > statsmodels still has avoided anything that smells like a global state that > changes calculation. If global states are stored in a stack, as in OpenGL, it is not so bad. A context manager could push a state in __enter__ and pop the state in __exit__. This is actually how I write OpenGL

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Eelco Hoogendoorn
Perhaps I in turn am missing something; but I would suppose that any algorithm that requires multiple passes over the data is off the table? Perhaps I am being a little old fashioned and performance oriented here, but to make the ultra-majority of use cases suffer a factor two performance penalty f

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Sturla Molden
Benjamin Root wrote: > My other concern would be with multi-threaded code (which is where a global > state would be bad). It would presumably require a global threading.RLock for protecting the global state. Sturla ___ NumPy-Discussion mailing list

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread josef.pktd
On Sat, Jul 26, 2014 at 2:44 PM, Benjamin Root wrote: > That is one way of doing it, and probably the cleanest way. Or else you > have to pass in the context object everywhere anyway. But I am not so > concerned about that (we do that for other things as well). Bigger concerns > would be nested c

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Benjamin Root
That is one way of doing it, and probably the cleanest way. Or else you have to pass in the context object everywhere anyway. But I am not so concerned about that (we do that for other things as well). Bigger concerns would be nested contexts. For example, what if one of the scikit functions use su

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread josef.pktd
On Sat, Jul 26, 2014 at 9:57 AM, Benjamin Root wrote: > I could get behind the context manager approach. It would help keep > backwards compatibility, while providing a very easy (and clean) way of > consistently using the same reduction operation. Adding kwargs is just a > road to hell. > Would

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Sturla Molden
Sturla Molden wrote: > Sebastian Berg wrote: > >> Yes, it is much more complicated and incompatible with naive ufuncs if >> you want your memory access to be optimized. And optimizing that is very >> much worth it speed wise... > > Why? Couldn't we just copy the data chunk-wise to a temporary b

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Sturla Molden
Sebastian Berg wrote: > Yes, it is much more complicated and incompatible with naive ufuncs if > you want your memory access to be optimized. And optimizing that is very > much worth it speed wise... Why? Couldn't we just copy the data chunk-wise to a temporary buffer of say 2**13 numbers and th

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Eelco Hoogendoorn
A context manager makes sense. I very much appreciate the time constraints and the effort put in this far, but if we can not make something work uniformly, I wonder if we should include it in the master at all. I don't have a problem with customizing algorithms where fp accuracy demands it; I have

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Sebastian Berg
On Sa, 2014-07-26 at 15:38 +0200, Eelco Hoogendoorn wrote: > I was wondering the same thing. Are there any known tradeoffs to this > method of reduction? > Yes, it is much more complicated and incompatible with naive ufuncs if you want your memory access to be optimized. And optimizing that is ve

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Benjamin Root
I could get behind the context manager approach. It would help keep backwards compatibility, while providing a very easy (and clean) way of consistently using the same reduction operation. Adding kwargs is just a road to hell. Cheers! Ben Root On Sat, Jul 26, 2014 at 9:53 AM, Julian Taylor < jta

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Julian Taylor
On 26.07.2014 15:38, Eelco Hoogendoorn wrote: > > Why is it not always used? for 1d reduction the iterator blocks by 8192 elements even when no buffering is required. There is a TODO in the source to fix that by adding additional checks. Unfortunately nobody knows hat these additional tests would

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Eelco Hoogendoorn
I was wondering the same thing. Are there any known tradeoffs to this method of reduction? On Sat, Jul 26, 2014 at 12:39 PM, Sturla Molden wrote: > Sebastian Berg wrote: > > > chose more stable algorithms for such statistical functions. The > > pairwise summation that is in master now is very

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Sturla Molden
Sebastian Berg wrote: > chose more stable algorithms for such statistical functions. The > pairwise summation that is in master now is very awesome, but it is not > secure enough in the sense that a new user will have difficulty > understanding when he can be sure it is used. Why is it not alway

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Sebastian Berg
On Fr, 2014-07-25 at 21:23 +0200, Eelco Hoogendoorn wrote: > It need not be exactly representable as such; take the mean of [1, 1 > +eps] for instance. Granted, there are at most two number in the range > of the original dtype which are closest to the true mean; but im not > sure that computing the

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Eelco Hoogendoorn
- From: "Julian Taylor" Sent: ‎26-‎7-‎2014 00:58 To: "Discussion of Numerical Python" Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays On 25.07.2014 23:51, Eelco Hoogendoorn wrote: > Ray: I'm not working with Hubble data, but yeah the

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread RayS
At 02:36 PM 7/25/2014, you wrote: >But it doesn't compensate for users to be aware of the problems. I >think the docstring and the description of the dtype argument is pretty clear. Most of the docs for the affected functions do not have a Note with the same warning as mean() - Ray __

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Julian Taylor
gave mediocre speedups (10%-20% on an i5). > > From: RayS <mailto:r...@blue-cove.com> > Sent: ‎25-‎7-‎2014 23:26 > To: Discussion of Numerical Python <mailto:numpy-discussion@scipy.org> > Subject: Re: [Numpy-discussion] num

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Eelco Hoogendoorn
uot;RayS" Sent: ‎25-‎7-‎2014 23:26 To: "Discussion of Numerical Python" Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays At 11:29 AM 7/25/2014, you wrote: >On Fri, Jul 25, 2014 at 5:56 PM, RayS wrote: > > The important point was that it would be

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread josef.pktd
On Fri, Jul 25, 2014 at 4:25 PM, RayS wrote: > At 11:29 AM 7/25/2014, you wrote: > >On Fri, Jul 25, 2014 at 5:56 PM, RayS wrote: > > > The important point was that it would be best if all of the > > methods affected > > > by summing 32 bit floats with 32 bit accumulators had the same Notes as >

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread RayS
At 11:29 AM 7/25/2014, you wrote: >On Fri, Jul 25, 2014 at 5:56 PM, RayS wrote: > > The important point was that it would be best if all of the > methods affected > > by summing 32 bit floats with 32 bit accumulators had the same Notes as > > numpy.mean(). We went through a lot of code yesterday,

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Eelco Hoogendoorn
It need not be exactly representable as such; take the mean of [1, 1+eps] for instance. Granted, there are at most two number in the range of the original dtype which are closest to the true mean; but im not sure that computing them exactly is a tractable problem for arbitrary input. Im not sure w

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Nathaniel Smith
On Fri, Jul 25, 2014 at 5:56 PM, RayS wrote: > The important point was that it would be best if all of the methods affected > by summing 32 bit floats with 32 bit accumulators had the same Notes as > numpy.mean(). We went through a lot of code yesterday, assuming that any > numpy or Scipy.stats fu

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Alan G Isaac
On 7/25/2014 1:40 PM, Eelco Hoogendoorn wrote: > At the risk of repeating myself: explicit is better than implicit This sounds like an argument for renaming the `mean` function `naivemean` rather than `mean`. Whatever numpy names `mean`, shouldn't it implement an algorithm that produces the mean

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Eelco Hoogendoorn
: "RayS" Sent: ‎25-‎7-‎2014 19:56 To: "Discussion of Numerical Python" Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays At 07:22 AM 7/25/2014, you wrote: > We were talking on this in the office, as we > realized it does affect a couple of

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread RayS
At 07:22 AM 7/25/2014, you wrote: > We were talking on this in the office, as we > realized it does affect a couple of lines dealing > with large arrays, including complex64. > As I expect Python modules to work uniformly > cross platform unless documented otherwise, to me > that includes 32 vs 6

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Robert Kern
On Fri, Jul 25, 2014 at 3:11 PM, RayS wrote: > At 01:22 AM 7/25/2014, you wrote: >> Actually the maximum precision I am not so >> sure of, as I personally prefer to make an >> informed decision about precision used, and get >> an error on a platform that does not support >> the specified precisio

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread RayS
At 01:22 AM 7/25/2014, you wrote: > Actually the maximum precision I am not so > sure of, as I personally prefer to make an > informed decision about precision used, and get > an error on a platform that does not support > the specified precision, rather than obtain > subtly or horribly broke

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Eelco Hoogendoorn
-- > From: Alan G Isaac > Sent: ‎25-‎7-‎2014 00:10 > > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] numpy.mean still broken for > largefloat32arrays > > On 7/24/2014 4:42 PM, Eelco Hoogendoorn wrote: > > This isn't a

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-24 Thread Eelco Hoogendoorn
nal Message- From: "Alan G Isaac" Sent: ‎25-‎7-‎2014 00:10 To: "Discussion of Numerical Python" Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays On 7/24/2014 4:42 PM, Eelco Hoogendoorn wrote: > This isn't a bug report, but rather a feat