Re: [Numpy-discussion] Fastest way to compute summary statistics for a specific axis

2015-03-17 Thread Dave Hirschfeld
Sebastian Berg sebastian at sipsolutions.net writes:

 
 On Mo, 2015-03-16 at 15:53 +, Dave Hirschfeld wrote:
  I have a number of large arrays for which I want to compute the mean 
and 
  standard deviation over a particular axis - e.g. I want to compute 
the 
  statistics for axis=1 as if the other axes were combined so that in 
the 
  example below I get two values back
  
  In [1]: a = randn(30, 2, 1)
  
  For the mean this can be done easily like:
  
  In [2]: a.mean(0).mean(-1)
  Out[2]: array([ 0.0007, -0.0009])
  
 
 If you have numpy 1.7+ (which I guess by now is basically always the
 case), you can do a.mean((0, 1)). Though it isn't actually faster in
 this example, probably because it has to use buffered iterators and
 things, but I would guess the performance should be much more stable
 depending on memory order, etc. then any other method.
 
 - Sebastian
 


Wow, I didn't know you could even do that - that's very cool (and a lot 
cleaner than manually reordering  reshaping)

It seems to be pretty fast for me and reasonably stable wrt memory 
order:

In [199]: %timeit a.mean(0).mean(-1)
 ...: %timeit a.mean(axis=(0,2))
 ...: %timeit a.transpose(1,0,2).reshape(2, -1).mean(axis=-1)
 ...: %timeit a.transpose(2,0,1).reshape(-1, 2).mean(axis=0)
 ...: 
1000 loops, best of 3: 1.52 ms per loop
1000 loops, best of 3: 1.5 ms per loop
100 loops, best of 3: 4.8 ms per loop
100 loops, best of 3: 14.6 ms per loop

In [200]: a = a.copy('F')

In [201]: %timeit a.mean(0).mean(-1)
 ...: %timeit a.mean(axis=(0,2))
 ...: %timeit a.transpose(1,0,2).reshape(2, -1).mean(axis=-1)
 ...: %timeit a.transpose(2,0,1).reshape(-1, 2).mean(axis=0)
100 loops, best of 3: 2.02 ms per loop
100 loops, best of 3: 3.29 ms per loop
100 loops, best of 3: 7.18 ms per loop
100 loops, best of 3: 15.9 ms per loop


Thanks,
Dave


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to compute summary statistics for a specific axis

2015-03-16 Thread Sebastian Berg
On Mo, 2015-03-16 at 15:53 +, Dave Hirschfeld wrote:
 I have a number of large arrays for which I want to compute the mean and 
 standard deviation over a particular axis - e.g. I want to compute the 
 statistics for axis=1 as if the other axes were combined so that in the 
 example below I get two values back
 
 In [1]: a = randn(30, 2, 1)
 
 For the mean this can be done easily like:
 
 In [2]: a.mean(0).mean(-1)
 Out[2]: array([ 0.0007, -0.0009])
 

If you have numpy 1.7+ (which I guess by now is basically always the
case), you can do a.mean((0, 1)). Though it isn't actually faster in
this example, probably because it has to use buffered iterators and
things, but I would guess the performance should be much more stable
depending on memory order, etc. then any other method.

- Sebastian

 
 ...but this won't work for the std. Using some transformations we can 
 come up with something which will work for either:
 
 In [3]: a.transpose(2,0,1).reshape(-1, 2).mean(axis=0)
 Out[3]: array([ 0.0007, -0.0009])
 
 In [4]: a.transpose(1,0,2).reshape(2, -1).mean(axis=-1)
 Out[4]: array([ 0.0007, -0.0009])
 
 
 If we look at the performance of these equivalent methods:
 
 In [5]: %timeit a.transpose(2,0,1).reshape(-1, 2).mean(axis=0)
 100 loops, best of 3: 14.5 ms per loop
 
 In [6]: %timeit a.transpose(1,0,2).reshape(2, -1).mean(axis=-1)
 100 loops, best of 3: 5.05 ms per loop
 
 
 we can see that the latter version is a clear winner. Investigating 
 further, both methods appear to copy the data so the performance is 
 likely down to better cache utilisation.
 
 In [7]: np.may_share_memory(a, a.transpose(2,0,1).reshape(-1, 2))
 Out[7]: False
 
 In [8]: np.may_share_memory(a, a.transpose(1,0,2).reshape(2, -1))
 Out[8]: False
 
 
 Both methods are however significantly slower than the initial attempt:
 
 In [9]: %timeit a.mean(0).mean(-1)
 1000 loops, best of 3: 1.2 ms per loop
 
 Perhaps because it allocates a smaller temporary?
 
 For those who like a challenge: is there a faster way to achieve what 
 I'm after?
 
 
 Cheers,
 Dave
 
 
 
 
 
 
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to compute summary statistics for a specific axis

2015-03-16 Thread Eric Moore
On Mon, Mar 16, 2015 at 11:53 AM, Dave Hirschfeld dave.hirschf...@gmail.com
 wrote:

 I have a number of large arrays for which I want to compute the mean and
 standard deviation over a particular axis - e.g. I want to compute the
 statistics for axis=1 as if the other axes were combined so that in the
 example below I get two values back

 In [1]: a = randn(30, 2, 1)

 For the mean this can be done easily like:

 In [2]: a.mean(0).mean(-1)
 Out[2]: array([ 0.0007, -0.0009])


 ...but this won't work for the std. Using some transformations we can
 come up with something which will work for either:

 In [3]: a.transpose(2,0,1).reshape(-1, 2).mean(axis=0)
 Out[3]: array([ 0.0007, -0.0009])

 In [4]: a.transpose(1,0,2).reshape(2, -1).mean(axis=-1)
 Out[4]: array([ 0.0007, -0.0009])


Specify all of the axes you want to reduce over as a tuple.

In [1]: import numpy as np


In [2]: a = np.random.randn(30, 2, 1)


In [3]: a.mean(axis=(0,-1))

Out[3]: array([-0.00224589, 0.00230759])


In [4]: a.std(axis=(0,-1))

Out[4]: array([ 1.00062771, 1.0001258 ])



-Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to compute summary statistics for a specific axis

2015-03-16 Thread Oscar Benjamin
On 16 March 2015 at 15:53, Dave Hirschfeld dave.hirschf...@gmail.com wrote:
 I have a number of large arrays for which I want to compute the mean and
 standard deviation over a particular axis - e.g. I want to compute the
 statistics for axis=1 as if the other axes were combined so that in the
 example below I get two values back

 In [1]: a = randn(30, 2, 1)

...
 Both methods are however significantly slower than the initial attempt:

 In [9]: %timeit a.mean(0).mean(-1)
 1000 loops, best of 3: 1.2 ms per loop

 Perhaps because it allocates a smaller temporary?

 For those who like a challenge: is there a faster way to achieve what
 I'm after?

You'll probably find it faster if you swap the means around to make an
even smaller temporary:

a.mean(-1).mean(0)


Oscar
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion