Re: [Numpy-discussion] fast_any_all , a trivial but fast/useful helper function for numpy

2013-09-11 Thread Graeme B. Bell
From my previous mail: 

 this has the same performance as your code:
 a = empty([3] list(A.shape)

For anyone that is interested. I ran a benchmark on the code after Julian 
kindly provided me with a correction to the listing he posted.

 a = empty([3] + list(A.shape))
 a[0] = A5; a[1] = B2; a[2] = A10;
 np.any(a, 0)


Julian also suggested trying the idiom np.vstack([A,B,C]) instead of [A,B,C].

Revised benchmarks here. I've moved the [A5, B2, A10] creation outside the 
timing loop in all cases since it was distorting results due to array creation, 
which shouldn't be part of the any() timing measurement. I'm also now using 
separate test arrays to avoid the possibility of side effects between tests of 
different functions. 

The following results are produced consistently: 

np.any() - 2.68 s
np.any() with Julian's first idiom above: - 0.24s  
faa.any() (original version) - 0.2s
np.any() with vstack(): 0.14s
faa.any() with vstack: 0.1s
faa.any() without vstack: 0.08s
(alternative faa implementations: 0.11-0.12s)

Conclusion:

fast_any_all is 30x faster than numpy.any() 1.7

fast_any_all is 43% faster than numpy.any() 1.7 with the vstack() idiom, which 
I understand to be the basis for the new approach in numpy.any() 1.8 
development branch. 

I'd be really interested to see the benchmarks under the current 1.8 master 
branch of numpy. Please can someone try this and send me the file?

# git clone https://github.com/gbb/numpy-fast-any-all.git
(read the source code to make sure I'm not evil)
# cd numpy-fast-any-all
# python test_fast_any_all.py  BENCHMARK.txt

Incidentally, this is an appropriate example of a case where a 'performance 
idiom' becomes a 'penalty idiom' unexpectedly when the underlying 
implementation changes (vstack). 

Thanks for your suggestions, Julian.

Graeme. 





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast_any_all , a trivial but fast/useful helper function for numpy

2013-09-05 Thread Graeme B. Bell
Hi Robert, 

Thanks for proposing an alternative implementation approach. 
However, did you test your proposal before you made the assertion about its 
behaviour?


reduce(np.logical_or, inputs, False)
reduce(np.logical_and, inputs, True)

This code consistently benchmarks 20% slower than the method I use (tested on 
two different machines several times).


Your fast_logic() is basically reduce().

No, it isn't.


Updated benchmarks for your proposal and also for another alternative 
implemenation using boolean indexing at: 
https://github.com/gbb/numpy-fast-any-all/blob/master/BENCHMARK.md 


Three general points arising from this:

1 - idioms don't have test coverage

Generally, by using idioms rather than functions, you risk mistyping or 
misusing the form of the idiom and thus introducing a bug. You also lose out on 
explicit testing and implicit 'real world testing' that tends to build up 
around library functions.


2 - idioms aren't maintained or updated (and they have a unknown shelf life)

An idiom might be fast today (or not), it may be correct today, but tomorrow is 
unknown. 

A key problem is that the relative performance of the parts of a library like 
numpy will keep changing - sometimes substantially - and idiomatic approaches 
to overcome performance difficulties in the short term tend to become outdated 
and even harmful very quickly. As in this example, they can even be harmful 
from the moment they're written. Browsing a site like stackoverflow should show 
you both new and experienced users often taking inefficient approaches because 
of outdated idiomatic advice. 


3 - idioms are OK, but functions are better, because implementation hiding and 
abstraction are good things. 

If you use a benchmarked/tested function which acknowledges a range of 
alternative implementations, you have a reasonable degree of confidence that 
you're getting the best performance and correct behaviour, because you can 
actually see the effects of the alternative implementations in benchmarks/test 
output. 

It's a lot more sensible to use a function from a publicly available library - 
any library - than to manually maintain a set of idioms and have to continually 
search your software for the idioms, benchmark them to see if they're still 
beneficial, and modify them when they're not. 

Graeme


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast_any_all , a trivial but fast/useful helper function for numpy

2013-09-05 Thread Graeme B. Bell


Hi Julian,

Thanks for the post.  It's great to hear that the main numpy function is 
improving in 1.8, though I think there is still plenty of value here for 
performance junkies :-)   

I don't have 1.8beta installed (and I can't conveniently install it on my 
machines just now). If you have time, and have the beta installed, could you 
try this and mail me the output from the benchmark?  I'm curious to know. 

# git clone https://github.com/gbb/numpy-fast-any-all.git
# cd numpy-fast-any-all
# python test-fast-any-all.py

Graeme


On Sep 4, 2013, at 7:38 PM, Julian Taylor jtaylor.deb...@googlemail.com wrote:

 
 The result is 14 to 17x faster than np.any() for this use case.*
 
 any/all and boolean operations have been significantly speed up by
 vectorization in numpy 1.8 [0].
 They are now around 10 times faster than before, especially if the
 boolean array fits into one of the cpu caching layers.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast_any_all , a trivial but fast/useful helper function for numpy

2013-09-05 Thread Chris Barker - NOAA Federal
This is good stuff, but I can't help thinking that if I needed to do an
any/all test on a number of arrays with common and/or combos --
I'd probably write a Cython function to do it.

It could be a bit tricky to make it really general, but not bad for a
couple specific dtypes / use cases.

-just a thought...

Also -- how does this work with numexpr? It would be nice if it could
handle these kinds of cases.

-Chris





On Thu, Sep 5, 2013 at 1:54 AM, Graeme B. Bell g...@skogoglandskap.nowrote:



 Hi Julian,

 Thanks for the post.  It's great to hear that the main numpy function is
 improving in 1.8, though I think there is still plenty of value here for
 performance junkies :-)

 I don't have 1.8beta installed (and I can't conveniently install it on my
 machines just now). If you have time, and have the beta installed, could
 you try this and mail me the output from the benchmark?  I'm curious to
 know.

 # git clone https://github.com/gbb/numpy-fast-any-all.git
 # cd numpy-fast-any-all
 # python test-fast-any-all.py

 Graeme


 On Sep 4, 2013, at 7:38 PM, Julian Taylor jtaylor.deb...@googlemail.com
 wrote:

 
  The result is 14 to 17x faster than np.any() for this use case.*
 
  any/all and boolean operations have been significantly speed up by
  vectorization in numpy 1.8 [0].
  They are now around 10 times faster than before, especially if the
  boolean array fits into one of the cpu caching layers.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast_any_all , a trivial but fast/useful helper function for numpy

2013-09-05 Thread Julian Taylor
hi,
its not np.any that is slow in this case its np.array([A, B, C])

np.dstack([A, B, C]) is better but writing it like this has the same
performance as your code:
a = empty([3] list(A.shape)
a[0] = A5; a[1] = B2; a[2] = A10;
np.any(a, 0)

I'll check if creating an array from a sequence can be improved for this
case.

On 05.09.2013 10:54, Graeme B. Bell wrote:
 
 
 Hi Julian,
 
 Thanks for the post.  It's great to hear that the main numpy function is 
 improving in 1.8, though I think there is still plenty of value here for 
 performance junkies :-)   
 
 I don't have 1.8beta installed (and I can't conveniently install it on my 
 machines just now). If you have time, and have the beta installed, could you 
 try this and mail me the output from the benchmark?  I'm curious to know. 
 
 # git clone https://github.com/gbb/numpy-fast-any-all.git
 # cd numpy-fast-any-all
 # python test-fast-any-all.py
 
 Graeme
 
 
 On Sep 4, 2013, at 7:38 PM, Julian Taylor jtaylor.deb...@googlemail.com 
 wrote:
 

 The result is 14 to 17x faster than np.any() for this use case.*

 any/all and boolean operations have been significantly speed up by
 vectorization in numpy 1.8 [0].
 They are now around 10 times faster than before, especially if the
 boolean array fits into one of the cpu caching layers.
 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast_any_all , a trivial but fast/useful helper function for numpy

2013-09-04 Thread Robert Kern
On Wed, Sep 4, 2013 at 11:05 AM, Graeme B. Bell g...@skogoglandskap.no
wrote:

 In my current GIS raster work I often have a situation where I generate
code something like this:

  np.any([A4, A==2, B==5, ...])

 However, np.any() is quite slow.

 It's possible to use np.logical_or to solve the problem, but then you get
nested logical_or's, since logical_or combines only two parameters.
 It's also possible to use integer maths e.g. (A4)+(A==2)+(B==5)0.

 The question is: which is best (syntactically, in terms of performance,
etc)?

 I've written a little helper function to provide a faster version of
any() and all(). It's embarrassingly simple - just a for loop. However, I
think there's a syntactic advantage to using a helper function for this
situation rather than writing it idiomatically each time; and it reduces
the chance of a bug in idiomatic implementation. However, the code does not
cover all the use cases currently addressed by np.any() and np.all().

 I benchmarked to pick the fastest underlying implementation (logical_or
rather than integer maths).

 The result is 14 to 17x faster than np.any() for this use case.*

 Code  benchmark here:

   https://github.com/gbb/numpy-fast-any-all

 Please feel welcome to use it or improve it :-)

Try the following:

  any(map(np.any, inputs))
  all(map(np.all, inputs))

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast_any_all , a trivial but fast/useful helper function for numpy

2013-09-04 Thread Graeme B. Bell
Sorry, I should have been more clear.

As shown in the benchmark/example, the method is replacing the behaviour of 

   np.any(inputs, 0)

not the behaviour of

   np.any(inputs)

Here, where I'm making decisions based on overlaying layers of raster data in 
the same shape, I don't want to map the entire dataset to a single boolean, 
rather I want to preserve the layers' shape but identify if a condition was 
matched in any of the overlaid layers, generating a mask. 

For example, this type of reasoning: 

def mask(): 
for all pixel locations in the images, A, B and C: 
  if A[location] is 3, 19, or between 21 and 30  AND B[location] is any value 
AND C[location] is 1-4, 9-13... 
  pixel=True

This naturally fits the any/all metaphor.

Will update the description on github. 

Graeme. 

On Sep 4, 2013, at 12:05 PM, Graeme Bell g...@skogoglandskap.no wrote:

 In my current GIS raster work I often have a situation where I generate code 
 something like this:
 
 np.any([A4, A==2, B==5, ...]) 
 
 However, np.any() is quite slow.
 
 It's possible to use np.logical_or to solve the problem, but then you get 
 nested logical_or's, since logical_or combines only two parameters.
 It's also possible to use integer maths e.g. (A4)+(A==2)+(B==5)0.
 
 The question is: which is best (syntactically, in terms of performance, etc)?
 
 I've written a little helper function to provide a faster version of any() 
 and all(). It's embarrassingly simple - just a for loop. However, I think 
 there's a syntactic advantage to using a helper function for this situation 
 rather than writing it idiomatically each time; and it reduces the chance of 
 a bug in idiomatic implementation. However, the code does not cover all the 
 use cases currently addressed by np.any() and np.all(). 
 
 I benchmarked to pick the fastest underlying implementation (logical_or 
 rather than integer maths). 
 
 The result is 14 to 17x faster than np.any() for this use case.*
 
 Code  benchmark here:
 
  https://github.com/gbb/numpy-fast-any-all
 
 Please feel welcome to use it or improve it :-)
 
 Graeme.
 
 
 * (Should this become an execution path in np.any()... ?)

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast_any_all , a trivial but fast/useful helper function for numpy

2013-09-04 Thread Phil Elson
For the record, I started a discussion about 6 months ago about a
find_first type function which avoided running the logic over the whole
array (using lambdas instead). This spilled into a discussion about
implementing a short-cutted any or all function:
http://numpy-discussion.10968.n7.nabble.com/Implementing-a-find-first-style-function-tp33085.htmlwith
some interesting results.

Nothing more has been done with those discussions, but you may find it of
interest. (And I'd still be interested in taking it forwards if you have
any comments)

Cheers,



On 4 September 2013 13:14, Graeme B. Bell g...@skogoglandskap.no wrote:

 Sorry, I should have been more clear.

 As shown in the benchmark/example, the method is replacing the behaviour of

np.any(inputs, 0)

 not the behaviour of

np.any(inputs)

 Here, where I'm making decisions based on overlaying layers of raster data
 in the same shape, I don't want to map the entire dataset to a single
 boolean, rather I want to preserve the layers' shape but identify if a
 condition was matched in any of the overlaid layers, generating a mask.

 For example, this type of reasoning:

 def mask():
 for all pixel locations in the images, A, B and C:
   if A[location] is 3, 19, or between 21 and 30  AND B[location] is any
 value AND C[location] is 1-4, 9-13...
   pixel=True

 This naturally fits the any/all metaphor.

 Will update the description on github.

 Graeme.

 On Sep 4, 2013, at 12:05 PM, Graeme Bell g...@skogoglandskap.no wrote:

  In my current GIS raster work I often have a situation where I generate
 code something like this:
 
  np.any([A4, A==2, B==5, ...])
 
  However, np.any() is quite slow.
 
  It's possible to use np.logical_or to solve the problem, but then you
 get nested logical_or's, since logical_or combines only two parameters.
  It's also possible to use integer maths e.g. (A4)+(A==2)+(B==5)0.
 
  The question is: which is best (syntactically, in terms of performance,
 etc)?
 
  I've written a little helper function to provide a faster version of
 any() and all(). It's embarrassingly simple - just a for loop. However, I
 think there's a syntactic advantage to using a helper function for this
 situation rather than writing it idiomatically each time; and it reduces
 the chance of a bug in idiomatic implementation. However, the code does not
 cover all the use cases currently addressed by np.any() and np.all().
 
  I benchmarked to pick the fastest underlying implementation (logical_or
 rather than integer maths).
 
  The result is 14 to 17x faster than np.any() for this use case.*
 
  Code  benchmark here:
 
   https://github.com/gbb/numpy-fast-any-all
 
  Please feel welcome to use it or improve it :-)
 
  Graeme.
 
 
  * (Should this become an execution path in np.any()... ?)

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast_any_all , a trivial but fast/useful helper function for numpy

2013-09-04 Thread Julian Taylor
On 04.09.2013 12:05, Graeme B. Bell wrote:
 In my current GIS raster work I often have a situation where I generate code 
 something like this:
 
  np.any([A4, A==2, B==5, ...]) 
 
 However, np.any() is quite slow.
 
 It's possible to use np.logical_or to solve the problem, but then you get 
 nested logical_or's, since logical_or combines only two parameters.
 It's also possible to use integer maths e.g. (A4)+(A==2)+(B==5)0.
 
 The question is: which is best (syntactically, in terms of performance, etc)?
 
 I've written a little helper function to provide a faster version of any() 
 and all(). It's embarrassingly simple - just a for loop. However, I think 
 there's a syntactic advantage to using a helper function for this situation 
 rather than writing it idiomatically each time; and it reduces the chance of 
 a bug in idiomatic implementation. However, the code does not cover all the 
 use cases currently addressed by np.any() and np.all(). 
 
 I benchmarked to pick the fastest underlying implementation (logical_or 
 rather than integer maths). 
 
 The result is 14 to 17x faster than np.any() for this use case.*

any/all and boolean operations have been significantly speed up by
vectorization in numpy 1.8 [0].
They are now around 10 times faster than before, especially if the
boolean array fits into one of the cpu caching layers.
If they don't I recommend using a blocking utility function, something like:
for i in range(0, n, blocksize):
   view = d[i:i+blocksize]
   #dostuff on view

with this method and the new vectorizations in numpy you are almost as
fast as numexpr for floats and probably a lot faster with bools.


[0]
http://www.onerussian.com/tmp/numpy-vbench/vb_vb_ufunc.html#numpy-and-bool
http://www.onerussian.com/tmp/numpy-vbench/vb_vb_reduce.html#numpy-any-slow
(the dip before 1.7 was part of the NA branch and never released, 1.8
adds some of its optimizations back)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion