Re: [Numpy-discussion] NumPy-Discussion Digest, Vol 66, Issue 61

2012-03-20 Thread Matthieu Rigal
Hi Richard,

Thanks for your answer and the related help !

In fact, I was hoping to have a less memory and more speed solution. Something 
equivalent to a raster calculator for numpy. Wouldn't it make sense to have 
some optimized function to work on more than 2 arrays for numpy anyway ?

At the end, I am rather interested by more speed.

I tried first a code-sparing version :
array = numpy.asarray([(aBlueChannel  1.0),(aNirChannel  aBlueChannel * 
1.0),(aNirChannel  aBlueChannel * 1.8)]).all()

But this one is at the end more than 2 times slower than :
array1 = numpy.empty([3,6566,6682], dtype=numpy.bool)
numpy.less(aBlueChannel, 1.0, out=array1[0])
numpy.greater(aNirChannel, (aBlueChannel * 1.0), out=array1[1])
numpy.less(aNirChannel, (aBlueChannel * 1.8), out=array1[2])
array = array1.all()

(and this solution is about 30% faster than the original one)

I could find another way which was fine for me too:
array = (aBlueChannel  1.0) * (aNirChannel  (aBlueChannel * 1.0)) * 
(aNirChannel  (aBlueChannel * 1.8))

But this one is only 5-10% faster than the original solution, even if probably 
using less memory than the 2 previous ones. (same was possible with operator 
+, but slower than operator *)

Regards,
Matthieu Rigal


On Monday 19 March 2012 18:00:02 numpy-discussion-requ...@scipy.org wrote:
 Message: 2
 Date: Mon, 19 Mar 2012 13:20:23 +
 From: Richard Hattersley rhatters...@gmail.com
 Subject: Re: [Numpy-discussion] Using logical function on more than 2
 arrays, availability of a between function ?
 To: Discussion of Numerical Python numpy-discussion@scipy.org
 Message-ID:
 CAP=RS9=UBOc6Kmtmnne7W093t19w=T=osrxuaw0wf8b49hq...@mail.gmail.com
  Content-Type: text/plain; charset=ISO-8859-1
 
 What do you mean by efficient? Are you trying to get it execute
 faster? Or using less memory? Or have more concise source code?
 
 Less memory:
  - numpy.vectorize would let you get to the end result without any
 intermediate arrays but will be slow.
  - Using the out parameter of numpy.logical_and will let you avoid
 one of the intermediate arrays.
 
 More speed?:
 Perhaps putting all three boolean temporary results into a single
 boolean array (using the out parameter of numpy.greater, etc) and
 using numpy.all might benefit from logical short-circuiting.
 
 And watch out for divide-by-zero from aNirChannel/aBlueChannel.
 
 Regards,
 Richard Hattersley
 

RapidEye AG
Molkenmarkt 30
14776 Brandenburg an der Havel
Germany
 
Follow us on Twitter! www.twitter.com/rapideye_ag
 
Head Office/Sitz der Gesellschaft: Brandenburg an der Havel
Management Board/Vorstand: Ryan Johnson
Chairman of Supervisory Board/Vorsitzender des Aufsichtsrates: 
Robert Johnson
Commercial Register/Handelsregister Potsdam HRB 24742 P
Tax Number/Steuernummer: 048/100/00053
VAT-Ident-Number/Ust.-ID: DE 199331235
DIN EN ISO 9001 certified
 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy-Discussion Digest, Vol 66, Issue 61

2012-03-20 Thread Chris Barker
On Tue, Mar 20, 2012 at 5:13 AM, Matthieu Rigal ri...@rapideye.net wrote:

 In fact, I was hoping to have a less memory and more speed solution.

which do often go together, at least for big problems -- pushingm
emory around often takes more time than the computation itself.

 At the end, I am rather interested by more speed.

 I tried first a code-sparing version :
array = numpy.asarray([(aBlueChannel  1.0),(aNirChannel  aBlueChannel *
 1.0),(aNirChannel  aBlueChannel * 1.8)]).all()

(by the way -- it is MUCH better if you post example code with actual
data (the example data can be much smaller) -- while with small data
sets you can't test performance, you can test correctness -- the
easier you make it for us to try stuff out, the more we can help you.


re-formatting so I can read this, I get:

array = numpy.asarray([(aBlueChannel  1.0),
  (aNirChannel  aBlueChannel * 1.0),
  (aNirChannel  aBlueChannel * 1.8)]).all()

a few notes:

asarray  will work, but is pointless here -- I'd just use array --
asarray() is for when you may or may not have an array as input, and
you want to preserve it if you do.

I'd probably use numpy.vstack or rstack here, rather than the array
(or asarray) function -- there is a larger parsing overhead to array()

I suppose this is a special case, but multiplying by 1.0 is kind of
pointless there (or are  you doing that to cast to a float array)?

if you are doing an all() -- not much reason to put them all in the
same array first anyway.


 But this one is at the end more than 2 times slower than :

array1 = numpy.empty([3,6566,6682], dtype=numpy.bool)
numpy.less(aBlueChannel, 1.0, out=array1[0])
numpy.greater(aNirChannel, (aBlueChannel * 1.0), out=array1[1])
numpy.less(aNirChannel, (aBlueChannel * 1.8), out=array1[2])
array = array1.all()

yup -- creating temporaries can be slow for big data -- there is the
trade-off between compact code and performance some times.

I think you can be more memory efficient here, though -- if in the end
all you want is the final all check, no need to store all checks for
each channel -- something like:

#allocate a bool array:
array1 = numpy.empty( (6566,6682),  dtype=numpy.bool)

result = numpy.less(aBlueChannel, 1.0, out=array1).all()
result = numpy.greater(aNirChannel, (aBlueChannel * 1.0), out=array1).all()
result = numpy.less(aNirChannel, (aBlueChannel * 1.8), out=array1[2]).all()

three loops for the all(), but less memory to push around -- may be faster.

I'd also take a look at numexpr for this, it could be very helpful:

http://code.google.com/p/numexpr/

-Chris









 (and this solution is about 30% faster than the original one)

 I could find another way which was fine for me too:
 array = (aBlueChannel  1.0) * (aNirChannel  (aBlueChannel * 1.0)) *
 (aNirChannel  (aBlueChannel * 1.8))

 But this one is only 5-10% faster than the original solution, even if probably
 using less memory than the 2 previous ones. (same was possible with operator
 +, but slower than operator *)

 Regards,
 Matthieu Rigal


 On Monday 19 March 2012 18:00:02 numpy-discussion-requ...@scipy.org wrote:
 Message: 2
 Date: Mon, 19 Mar 2012 13:20:23 +
 From: Richard Hattersley rhatters...@gmail.com
 Subject: Re: [Numpy-discussion] Using logical function on more than 2
         arrays, availability of a between function ?
 To: Discussion of Numerical Python numpy-discussion@scipy.org
 Message-ID:
         CAP=RS9=UBOc6Kmtmnne7W093t19w=T=osrxuaw0wf8b49hq...@mail.gmail.com
  Content-Type: text/plain; charset=ISO-8859-1

 What do you mean by efficient? Are you trying to get it execute
 faster? Or using less memory? Or have more concise source code?

 Less memory:
  - numpy.vectorize would let you get to the end result without any
 intermediate arrays but will be slow.
  - Using the out parameter of numpy.logical_and will let you avoid
 one of the intermediate arrays.

 More speed?:
 Perhaps putting all three boolean temporary results into a single
 boolean array (using the out parameter of numpy.greater, etc) and
 using numpy.all might benefit from logical short-circuiting.

 And watch out for divide-by-zero from aNirChannel/aBlueChannel.

 Regards,
 Richard Hattersley


 RapidEye AG
 Molkenmarkt 30
 14776 Brandenburg an der Havel
 Germany

 Follow us on Twitter! www.twitter.com/rapideye_ag

 Head Office/Sitz der Gesellschaft: Brandenburg an der Havel
 Management Board/Vorstand: Ryan Johnson
 Chairman of Supervisory Board/Vorsitzender des Aufsichtsrates:
 Robert Johnson
 Commercial Register/Handelsregister Potsdam HRB 24742 P
 Tax Number/Steuernummer: 048/100/00053
 VAT-Ident-Number/Ust.-ID: DE 199331235
 DIN EN ISO 9001 certified


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR