Re: [Numpy-discussion] [Newbie] Fast plotting

2009-01-07 Thread Franck Pommereau
Hi all,

First, let me say that I'm impressed: this mailing list is probably the
most reactive I've ever seen. I've asked my first question and got
immediately more solutions than time to test them... Many thanks to all
the answerers.

Using the various proposals, I ran two performance tests:
 - test 1: 200 random values
 - test 2: 1328724 values from my real use case

Here are the various functions and how they perform:

def f0 (x, y) :
Initial version

test 1 CPU times: 13.37s
test 2 CPU times: 5.92s

s, n = {}, {}
for a, b in zip(x, y) :
s[a] = s.get(a, 0.0) + b
n[a] = n.get(a, 0) + 1
return (numpy.array([a for a in sorted(s)]),
numpy.array([s[a]/n[a] for a in sorted(s)]))

def f1 (x, y) :
Alan G Isaac ais...@american.edu
Modified in order to sort the result only once.

test 1 CPU times: 10.86s
test 2 CPU times: 2.78s

defaultdict indeed speeds things up, probably avoiding one of two
sorts is good also

s, n = defaultdict(float), defaultdict(int)
for a, b in izip(x, y) :
s[a] += b
n[a] += 1
new_x = numpy.array([a for a in sorted(s)])
return (new_x,
numpy.array([s[a]/n[a] for a in new_x]))

def f2 (x, y) :
Francesc Alted fal...@pytables.org
Modified with preallocation of arrays (it appeared faster)

test 1: killed after more than 10 minutes
test 2 CPU times: 22.01s

This result is not surprising as I guess a quadratic complexity: one
pass for each unique value in x, and presumably one nested pass to
compute y[x==i]

u = numpy.unique(x)
m = numpy.array(range(len(u)))
for pos, i in enumerate(u) :
g = y[x == i]
m[pos] = g.mean()
return u, m

def f3 (x, y) :
Sebastian Stephan Berg sebast...@sipsolutions.net
Modified because I can always work in place.

test 1 CPU times: 17.43s
test 2 CPU times: 0.21s

Adopted! This is definitely the fastest one when using real values.
I tried to preallocate arrays by setting u=numpy.unique(x) and the
looping on u, but the result is slower, probably because of unique()

Compared with f1, its slower on larger arrays of random values. It
may be explained by a complexity argument: f1 as a linear complexity
(two passes in sequence) while f3 is probably N log N (a sequence of
one sort, two passes to set x[:] and y[:] and one loop on each
distinct value with a nested searchsorted that is probably
logarithmic). But, real values are far from random, and the sort is
probably more efficient, as well as the while loop is shorter
because there are less values.

s = x.argsort()
x[:] = x[s]
y[:] = y[s]
u, means, start, value = [], [], 0, x[0]
while True:
next = x.searchsorted(value, side='right')
u.append(value)
means.append(y[start:next].mean())
if next == len(x):
break
value = x[next]
start = next
return numpy.array(u), numpy.array(means)

def f4 (x, y) :
Jean-Baptiste Rudant boogalo...@yahoo.fr

test 1 CPU times: 111.21s
test 2 CPU times: 13.48s

As Jean-Baptiste noticed, this solution is not very efficient (but
works almost of-the-shelf).

recXY = numpy.rec.fromarrays((x, x), names='x, y')
return matplotlib.mlab.rec_groupby(recXY, ('x',),
   (('y', numpy.mean, 'y_avg'),))

A few more remarks.

Sebastian Stephan Berg wrote:
 Just thinking. If the parameters are limited, you may be able to use the
 histogram feature? Doing one histogram with Y as weights, then one
 without weights and calculating the mean from this yourself should be
 pretty speedy I imagine.

I'm afraid I don't know what the histogram function computes. But this
may be something worth to investigate because I think I'll need it later
on in order to smooth my graphs (by plotting mean values on intervals).

Bruce Southey wrote:
 If you use Knuth's one pass approach

(http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#III._On-line_algorithm)

 you can write a function to get the min, max, mean and variance/standard
 deviation in a single pass through the array rather than one pass for
 each. I do not know if this will provide any advantage as that will
 probably depend on the size of the arrays.

If I understood well, this algorithm computes the variance of a whole
array, I can see how to adapt it to compute mean (already done by the
algorithm), max, min, etc., but I did not see how it can be adapted to
my case.

 Also, please use the highest precision possible (ie float128) for your
 arrays to minimize numerical error due to the size of your arrays.

Thanks for the advice!

So, thank you again everybody.
Cheers,
Franck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Nicolas ROUX
Hi,

I need help ;-)
I have here a testcase which works much faster in Matlab than Numpy.
 
The following code takes less than 0.9sec in Matlab, but 21sec in Python.
Numpy is 24 times slower than Matlab !
The big trouble I have is a large team of people within my company is ready to 
replace Matlab by Numpy/Scipy/Matplotlib,
but I have to demonstrate that this kind of Python Code is executed with the 
same performance than Matlab, without writing C extension.
This is becoming a critical point for us.

This is a testcase that people would like to see working without any code 
restructuring.
The reasons are:
- this way of writing is fairly natural.
- the original code which showed me the matlab/Numpy performance differences is 
much more complex,
and can't benefit from broadcasting or other numpy tips (I can later give this 
code)

...So I really need to use the code below, without restructuring.

Numpy/Python code:
#
import numpy
import time

print Start test \n 

dim = 3000

a = numpy.zeros((dim,dim,3))

start = time.clock()

for i in range(dim):
for j in range(dim):
a[i,j,0] = a[i,j,1]
a[i,j,2] = a[i,j,0]
a[i,j,1] = a[i,j,2]

end = time.clock() - start

print Test done,   %f sec % end
#

Matlab code:
#
'Start test'
dim = 3000;
tic;
a =zeros(dim,dim,3);
for i = 1:dim
for j = 1:dim 
a(i,j,1) = a(i,j,2);
a(i,j,2) = a(i,j,1);
a(i,j,3) = a(i,j,3);
end
end
toc
'Test done'
#

Any idea on it ?
Did I missed something ?

Thanks a lot, in advance for your help.


Cheers,
Nicolas.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread David Cournapeau
Nicolas ROUX wrote:
 Hi,

 I need help ;-)
 I have here a testcase which works much faster in Matlab than Numpy.
  
 The following code takes less than 0.9sec in Matlab, but 21sec in Python.
 Numpy is 24 times slower than Matlab !
 The big trouble I have is a large team of people within my company is ready 
 to replace Matlab by Numpy/Scipy/Matplotlib,
 but I have to demonstrate that this kind of Python Code is executed with the 
 same performance than Matlab, without writing C extension.
 This is becoming a critical point for us.

 This is a testcase that people would like to see working without any code 
 restructuring.
 The reasons are:
 - this way of writing is fairly natural.
 - the original code which showed me the matlab/Numpy performance differences 
 is much more complex,
 and can't benefit from broadcasting or other numpy tips (I can later give 
 this code)

 ...So I really need to use the code below, without restructuring.

 Numpy/Python code:
 #
 import numpy
 import time

 print Start test \n 

 dim = 3000

 a = numpy.zeros((dim,dim,3))

 start = time.clock()

 for i in range(dim):
 for j in range(dim):
 a[i,j,0] = a[i,j,1]
 a[i,j,2] = a[i,j,0]
 a[i,j,1] = a[i,j,2]

 end = time.clock() - start

 print Test done,   %f sec % end
 #

 Matlab code:
 #
 'Start test'
 dim = 3000;
 tic;
 a =zeros(dim,dim,3);
 for i = 1:dim
 for j = 1:dim 
 a(i,j,1) = a(i,j,2);
 a(i,j,2) = a(i,j,1);
 a(i,j,3) = a(i,j,3);
 end
 end
 toc
 'Test done'
 #

 Any idea on it ?
 Did I missed something ?
   

I think on recent versions of matlab, there is nothing you can do
without modifying the code: matlab has some JIT compilation for loops,
which is supposed to speed up those cases - at least, that's what is
claimed by matlab. The above loops are typical examples where this
should work reasonably well I believe:

http://www.mathworks.com/access/helpdesk_r13/help/techdoc/matlab_prog/ch7_pe10.html

If you really have to use loops, then matlab will be faster. But maybe
you don't; can you show us a more typical example ?

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Ryan May
Nicolas ROUX wrote:
 Hi,
 
 I need help ;-)
 I have here a testcase which works much faster in Matlab than Numpy.
  
 The following code takes less than 0.9sec in Matlab, but 21sec in Python.
 Numpy is 24 times slower than Matlab !
 The big trouble I have is a large team of people within my company is ready 
 to replace Matlab by Numpy/Scipy/Matplotlib,
 but I have to demonstrate that this kind of Python Code is executed with the 
 same performance than Matlab, without writing C extension.
 This is becoming a critical point for us.
 
 This is a testcase that people would like to see working without any code 
 restructuring.
 The reasons are:
 - this way of writing is fairly natural.
 - the original code which showed me the matlab/Numpy performance differences 
 is much more complex,
 and can't benefit from broadcasting or other numpy tips (I can later give 
 this code)
 
 ...So I really need to use the code below, without restructuring.
 
 Numpy/Python code:
 #
 import numpy
 import time
 
 print Start test \n 
 
 dim = 3000
 
 a = numpy.zeros((dim,dim,3))
 
 start = time.clock()
 
 for i in range(dim):
 for j in range(dim):
 a[i,j,0] = a[i,j,1]
 a[i,j,2] = a[i,j,0]
 a[i,j,1] = a[i,j,2]
 
 end = time.clock() - start
 
 print Test done,   %f sec % end
 #
SNIP
 Any idea on it ?
 Did I missed something ?

I think you may have reduced the complexity a bit too much.  The python code
above sets all of the elements equal to a[i,j,1].  Is there any reason you can't
use slicing to avoid the loops?

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Matthieu Brucher
 for i in range(dim):
for j in range(dim):
a[i,j,0] = a[i,j,1]
a[i,j,2] = a[i,j,0]
a[i,j,1] = a[i,j,2]

 for i = 1:dim
for j = 1:dim
a(i,j,1) = a(i,j,2);
a(i,j,2) = a(i,j,1);
a(i,j,3) = a(i,j,3);
end
 end

Hi,

The two loops are not the same.
As David stated, with JIT, the loops may be vectorized by Matlab on the fly.

-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Grissiom
On Wed, Jan 7, 2009 at 23:44, Ryan May rma...@gmail.com wrote:

 Nicolas ROUX wrote:
  Hi,
 
  I need help ;-)
  I have here a testcase which works much faster in Matlab than Numpy.
 
  The following code takes less than 0.9sec in Matlab, but 21sec in Python.
  Numpy is 24 times slower than Matlab !
  The big trouble I have is a large team of people within my company is
 ready to replace Matlab by Numpy/Scipy/Matplotlib,
  but I have to demonstrate that this kind of Python Code is executed with
 the same performance than Matlab, without writing C extension.
  This is becoming a critical point for us.
 
  This is a testcase that people would like to see working without any code
 restructuring.
  The reasons are:
  - this way of writing is fairly natural.
  - the original code which showed me the matlab/Numpy performance
 differences is much more complex,
  and can't benefit from broadcasting or other numpy tips (I can later give
 this code)
 
  ...So I really need to use the code below, without restructuring.
 
  Numpy/Python code:
  #
  import numpy
  import time
 
  print Start test \n
 
  dim = 3000
 
  a = numpy.zeros((dim,dim,3))
 
  start = time.clock()
 
  for i in range(dim):
  for j in range(dim):
  a[i,j,0] = a[i,j,1]
  a[i,j,2] = a[i,j,0]
  a[i,j,1] = a[i,j,2]
 
  end = time.clock() - start
 
  print Test done,   %f sec % end
  #
 SNIP
  Any idea on it ?
  Did I missed something ?

 I think you may have reduced the complexity a bit too much.  The python
 code
 above sets all of the elements equal to a[i,j,1].  Is there any reason you
 can't
 use slicing to avoid the loops?


Yes, I think so. I think the testcase  is a matter of python loop vs matlab
loop rather than python vs matlab.

-- 
Cheers,
Grissiom
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Newbie] Fast plotting

2009-01-07 Thread John Hunter
On Wed, Jan 7, 2009 at 6:37 AM, Franck Pommereau
pommer...@univ-paris12.fr wrote:

 def f4 (x, y) :
Jean-Baptiste Rudant boogalo...@yahoo.fr

test 1 CPU times: 111.21s
test 2 CPU times: 13.48s

As Jean-Baptiste noticed, this solution is not very efficient (but
works almost of-the-shelf).

recXY = numpy.rec.fromarrays((x, x), names='x, y')
return matplotlib.mlab.rec_groupby(recXY, ('x',),
   (('y', numpy.mean, 'y_avg'),))

This probably will have no impact on your tests, but this looks like a
bug.  You probably mean:

  recXY = numpy.rec.fromarrays((x, y), names='x, y')

Could you post the code you use to generate you inputs (ie what is x?)

I will look into trying some of the suggestions here to improve the
performance on rec_groupby.  One thing that slows it down is that it
supports an arbitrary number of keys -- eg groupby ('year', 'month')
-- whereas the examples above are using a single value lookup.

JDH
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Newbie] Fast plotting

2009-01-07 Thread Franck Pommereau
 This probably will have no impact on your tests, but this looks like a
 bug.  You probably mean:
 
   recXY = numpy.rec.fromarrays((x, y), names='x, y')

Sure! Thanks.

 Could you post the code you use to generate you inputs (ie what is x?)

My code is probably not usable by somebody else than me. I'm presently
too busy to clean it and add comments. But as soon as I'll be able to do
so, I'll send you the usable version.

Cheers,
Franck


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread josef . pktd
On Wed, Jan 7, 2009 at 10:58 AM, Grissiom chaos.pro...@gmail.com wrote:
 On Wed, Jan 7, 2009 at 23:44, Ryan May rma...@gmail.com wrote:

 Nicolas ROUX wrote:
  Hi,
 
  I need help ;-)
  I have here a testcase which works much faster in Matlab than Numpy.
 
  The following code takes less than 0.9sec in Matlab, but 21sec in
  Python.
  Numpy is 24 times slower than Matlab !
  The big trouble I have is a large team of people within my company is
  ready to replace Matlab by Numpy/Scipy/Matplotlib,
  but I have to demonstrate that this kind of Python Code is executed with
  the same performance than Matlab, without writing C extension.
  This is becoming a critical point for us.
 
  This is a testcase that people would like to see working without any
  code restructuring.
  The reasons are:
  - this way of writing is fairly natural.
  - the original code which showed me the matlab/Numpy performance
  differences is much more complex,
  and can't benefit from broadcasting or other numpy tips (I can later
  give this code)
 
  ...So I really need to use the code below, without restructuring.
 
  Numpy/Python code:
  #
  import numpy
  import time
 
  print Start test \n
 
  dim = 3000
 
  a = numpy.zeros((dim,dim,3))
 
  start = time.clock()
 
  for i in range(dim):
  for j in range(dim):
  a[i,j,0] = a[i,j,1]
  a[i,j,2] = a[i,j,0]
  a[i,j,1] = a[i,j,2]
 
  end = time.clock() - start
 
  print Test done,   %f sec % end
  #
 SNIP
  Any idea on it ?
  Did I missed something ?

 I think you may have reduced the complexity a bit too much.  The python
 code
 above sets all of the elements equal to a[i,j,1].  Is there any reason you
 can't
 use slicing to avoid the loops?


 Yes, I think so. I think the testcase  is a matter of python loop vs matlab
 loop rather than python vs matlab.

 --
 Cheers,
 Grissiom

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion



I tried with matlab 2006a, I don't know if there is JIT, but the main
speed difference comes with the numpy array access.

The test is actually biased in favor of python, since in the matlab
code the initialization with zeros is inside the time count, but
outside in the python version

If I just put b=1.0 inside the double loop (no numpy)

Python  1.453644 sec
matlab  0.335249 seconds, with zeros outside loop:  0.060582 seconds

with original array assignment:

python/numpy 32.745030 sec
matlab 1.633415 seconds, with zeros outside loop: 1.251597 seconds

(putting the loop in a function and using psyco reduces speed by 30%)


Josef
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] help with typemapping a C function to use numpy arrays

2009-01-07 Thread Rich E
Here is my example, trying to wrap the function sms_spectrumMag that
we have been dealing with:

%apply (int DIM1, float* IN_ARRAY1) {(int sizeInArray, float* pInArray)};
%apply (int DIM1, float* INPLACE_ARRAY1) {(int sizeOutArray, float* pOutArray)};

%inline %{

void my_spectrumMag( int sizeInArray, float *pInArray, int
sizeOutArray, float *pOutArray)
{
sms_spectrumMag(sizeOutArray, pInArray, pOutArray);
}

%}


at this point,  have the new function my_spectrumMag that wraps
sms_spectrumMag() and provides arguments that can be typemapped using
numpy.i  Now, I don't want to have to call the function
my_spectrumMag() in python, I want to use the original name, I would
like to call the function as:

sms_spectrumMag(numpyArray1, numpyArray2)

But, trying to %rename my_spectrumMag to sms_spectrumMag does not
work, the original sms_spectrumMag gets called in python instead.
Trying to %ignore the original function first as follows removes the
sms_spectrumMag completely from the module and I am left with
my_spectrumMag:

%ignore sms_spectrumMag;
%rename (sms_spectrumMag) my_spectrumMag;


Do you see my problem?


On Wed, Jan 7, 2009 at 8:58 AM, Matthieu Brucher
matthieu.bruc...@gmail.com wrote:
 2009/1/6 Rich E reakina...@gmail.com:
 This helped immensely.  I feel like I am getting close to being able
 to accomplish what I would like with SWIG: producing a python module
 that can be very 'python-like', while co-existing with the c library
 that is very 'c-like'.

 There is one question still remaining though, is it possible to make
 the wrapped function have the same name still?  Using either
 my_spectrumMag or spectrumMag means I have to create a number of
 inconsistencies between the python module and the c library.  It is
 ideal to ignore (%ignore?) the c sms_spectrumMag and instead use the
 wrapped one, with the same name.  But my attempts at doing this so far
 have not compiled because of name conflictions.

 Ok course you can. The function is renamed only if you say so. Perhaps
 can you provide a small example of what doesn't work at the moment ?

 Thanks for the help, I think you are doing great things with this
 numpy interface/typemaps system.

 Matthieu
 --
 Information System Engineer, Ph.D.
 Website: http://matthieu-brucher.developpez.com/
 Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
 LinkedIn: http://www.linkedin.com/in/matthieubrucher
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread josef . pktd
A test case closer to my applications is calling functions in loops:

Python
---
def assgn(a,i,j):
a[i,j,0] = a[i,j,1] + 1.0
a[i,j,2] = a[i,j,0]
a[i,j,1] = a[i,j,2]
return a

print Start test \n

dim = 300#0

a = numpy.zeros((dim,dim,3))
start = time.clock()

for i in range(dim):
for j in range(dim):
assgn(a,i,j)

end = time.clock() - start
assert numpy.max(a)==1.0  #added to check inplace substitution
print Test done,   %f sec % end
---

matlab:
--
function a = tryloopspeed()

'Start test'
dim = 300;
a = zeros(dim,dim,3);
tic;
for i = 1:dim
   for j = 1:dim
a = assgn(a,i,j);
   end
end
toc
'Test done'
end
function a = assgn(a,i,j)
 a(i,j,1) = a(i,j,2);
 a(i,j,2) = a(i,j,1);
 a(i,j,3) = a(i,j,3);
end
---

Note: I had to reduce the size of the matrix because I got impatient
waiting for matlab
time:

python: Test done,   0.486127 sec
matlab:
 output = tryloopspeed();
ans =
Start test
Elapsed time is 511.815971 seconds.
ans =
Test done
 511.815971/60.0  #minutes
ans =
  8.530

matlab takes 1053 times the time of python
The problem is that at least in my version of matlab, it copies
function arguments when they are modified. It's possible to work
around this, but not very clean.

So for simple loops python looses, but for other things, python wins
by a huge margin. Unless somebody can spot a mistake in my timing.

Josef
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Christopher Barker
 Nicolas ROUX wrote:
 The big trouble I have is a large team of people within my company is ready 
 to replace Matlab by Numpy/Scipy/Matplotlib,

we like that!

 This is a testcase that people would like to see working without any code 
 restructuring.
 The reasons are:
 - this way of writing is fairly natural.

Only if you haven't wrapped your brain around array-oriented 
programming! (see below)

 - the original code which showed me the matlab/Numpy performance differences 
 is much more complex,
 and can't benefit from broadcasting or other numpy tips (I can later give 
 this code)

so you're asking: how can I make this code faster without changing it? 
The only way to do that is to change python or numpy, and while it might 
be nice to do that to improve performance in this type of case, it's a 
tall order!

It's really not a good goal, anyway -- python/numpy is by no means a 
drop-in replacement for MATLAB -- they are very different beasts. 
Personally, I think most of the differences favor Python, but if you try 
to write python the same way you'd write MATLAB, you'll lose most of the 
benefits -- you might as well stick with MATLAB.

However, in this case, MATLAB was traditionally slow with loops and 
indexing and needed to be vectorized for decent performance as well.

It look like they now have a nice JIT compiler for this sort of thing -- 
to get a similar effect in numpy, you'll need to use weave or Cython or 
something, notable not as easy as having the interpreter just do it for you.

I'd love to see a numpy-aware psyco some day, an maybe the new buffer 
interface will facilitate that, but it's inherently harder with numpy -- 
MATLAB at least used to be limited to 2-d arrays of doubles, so far less 
special casing to be done.

Even with this nifty JIT, I think Python has many advantages -- if your 
code is well written, there will be a only a few places with these sorts 
of performance bottlenecks, and weave or Cython, or SWIG, or Ctypes, or 
f2py can all give you a good solution.

One other thought -- could numexp help here?


About array-oriented programming:

All lot of folks seem to think that the only reason to vectorize code 
in MATLAB, numpy, etc, is for better performance. If MATLAB now has a 
good JIT, then there is no point -- I think that's a mistake. If you 
write your code to work with arrays of data, you get more compact, less 
bug-prone code than if you are working with indexed elements all the 
time. I also think the code is clearer most of the time. I say most, 
because sometimes you do need to do tricks to vectorize that can 
obfuscate the code.

I understand that this may be a simplified example, and the real 
use-case could be quite different. However:

 a = numpy.zeros((dim,dim,3))

so we essentially have three square arrays stacked together -- what do 
they represent? that might help guide you, but without that, I can still 
see:

 for i in range(dim):
 for j in range(dim):

this really means -- for every element of the 2-d arrays, which can be 
written as: a[:,:]

 a[i,j,0] = a[i,j,1]
 a[i,j,2] = a[i,j,0]
 a[i,j,1] = a[i,j,2]

and this is simply swapping the three around. So, if you start out 
thinking in terms of a set of 2-d arrays, rather than a huge pile of 
elements, the code you will arrive at is more like:

a[:,:,0] = a[:,:,1]
a[:,:,2] = a[:,:,0]
a[:,:,1] = a[:,:,2]

With no loops:

or you could give them names:

a0 = a[:,:,0]
a1 = a[:,:,1]
a2 = a[:,:,2]

then:

a0[:] = a1
a2[:] = a0
a1[:] = a2

which, of course, is really:

a[:,:,:] = a1.reshape((dim,dim,1))

but I suspect that that's the result of a typo.

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Christopher Barker
josef.p...@gmail.com wrote:
 So for simple loops python looses, but for other things, python wins
 by a huge margin.

which emphasizes the point that you can't write code the same way in the 
two languages, though I'd argue that that code needs refactoring in any 
language!

However, numpy's reference semantics is definitely a strong advantage of 
MATLAB -- more flexibility in general.

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Xavier Gnata
Well it is the best pitch for numpy versus matlab I have read so far :)
(and I 100% agree)

Xavier
 On 1/7/2009 4:16 PM, David Cournapeau wrote:

   
 I think on recent versions of matlab, there is nothing you can do
 without modifying the code: matlab has some JIT compilation for loops,
 which is supposed to speed up those cases - at least, that's what is
 claimed by matlab. 
 

 Yes it does. After using both for more than 10 years, my impression is this:

 - Matlab slicing creates new arrays. NumPy slicing creates views. NumPy 
 is faster and more memory efficient.

 - Matlab JIT compiles loops. NumPy does not. Matlab is faster for stupid 
 programmers that don't know how use slices. But neither Matlab nor 
 Python/NumPy is meant to be used like Java.

 - Python has psyco. It is about as good as Matlab's JIT. But psyco has 
 no knowledge of NumPy ndarrays.

 - Using Cython is easier than writing Matlab MEX files.

 - Python has better support for data structures, better built-in 
 structures (tuple, lists, dics, sets), and general purpose libraries. 
 Matlab has extensive numerical toolboxes that you can buy.

 - Matlab pass function arguments by value (albeit COW optimized). Python 
 pass references. This makes NumPy more efficient if you need to pass 
 large arrays or array slices.

 - Matlab tends to fragment the heap (hence the pack command). 
 Python/NumPy does not. This makes long-running processes notoriously 
 unstable on Matlab.

 - Matlab has some numerical libraries that are better.

 - I like the Matlab command prompt and IDE better. But its not enough to 
 make me want to use it.

 - Python is a proper programming language. Matlab is a numerical 
 scripting language - good for small scripts but not complex software 
 systems.


 Sturla Molden







 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion
   

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Sturla Molden
On 1/7/2009 6:56 PM, Christopher Barker wrote:

 So for simple loops python looses, but for other things, python wins
 by a huge margin.
 
 which emphasizes the point that you can't write code the same way in the 
 two languages, though I'd argue that that code needs refactoring in any 
 language!

Roux example would be bad in either language. Slices ('vectorization' in 
Matlab lingo) is preferred in both cases. It's just that neither Matlab 
nor Python/NumPy was designed to be used like Java. For loops should not 
be abused in Python nor in Matlab (but Matlab is more forgiving now than 
it used to be).


Sturla Molden
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Sturla Molden
On 1/7/2009 6:51 PM, Christopher Barker wrote:

 Even with this nifty JIT, 

It is not a very nifty JIT. It can transform some simple loops into 
vectorized expressions. And it removes the overhead from indexing with 
doubles.

But if you are among those that do

n = length(x)
m = 0
for i = 1.0 : n
   m = m + x(i)
end
m = m / n

instead of

m = mean(x)

it will be nifty enough.


 All lot of folks seem to think that the only reason to vectorize code 
 in MATLAB, numpy, etc, is for better performance. If MATLAB now has a 
 good JIT, then there is no point -- I think that's a mistake. 

Fortran 90/95 has array slicing as well.


Sturla Molden






___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread josef . pktd
On Wed, Jan 7, 2009 at 1:32 PM, Sturla Molden stu...@molden.no wrote:
 On 1/7/2009 6:56 PM, Christopher Barker wrote:

 So for simple loops python looses, but for other things, python wins
 by a huge margin.

 which emphasizes the point that you can't write code the same way in the
 two languages, though I'd argue that that code needs refactoring in any
 language!

 Roux example would be bad in either language. Slices ('vectorization' in
 Matlab lingo) is preferred in both cases. It's just that neither Matlab
 nor Python/NumPy was designed to be used like Java. For loops should not
 be abused in Python nor in Matlab (but Matlab is more forgiving now than
 it used to be).


 Sturla Molden
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


I'm missing name spaces in matlab. everything is
from path import *

and it's more difficult to keep are larger project organized in matlab
than in python.

But, I think,
matlab is ahead in parallelization (which I haven't used much)
and learning matlab is easier than numpy. (dtypes and broadcasting are
more restrictive in matlab but, for a beginner, easier to figure out)

Josef
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Sturla Molden
On 1/7/2009 7:52 PM, josef.p...@gmail.com wrote:

 But, I think,
 matlab is ahead in parallelization (which I haven't used much)

Not really. There is e.g. nothing like Python's multiprocessing package 
in Matlab. Matlab is genrally single-threaded. Python is multi-threaded 
but there is a GIL. And having multiple Matlab processes running 
simultaneously consumes a lot of resources. Python is far better in this 
respect. Don't confuse vectorization with parallelization. It is not the 
same. If you are going to do real parallelization, you are better off 
using Python with multiprocessing or mpi4py.


 and learning matlab is easier than numpy. (dtypes and broadcasting are
 more restrictive in matlab but, for a beginner, easier to figure out)

The available data types is about the same, at least last time I 
checked. (I am not thinking about Python built-ins here, but NumPy dtypes.)

Matlab does not have broadcasting. Array shapes must always match.


S.M.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Robert Kern
On Wed, Jan 7, 2009 at 10:19, Nicolas ROUX nicolas.r...@st.com wrote:
 Hi,

 I need help ;-)
 I have here a testcase which works much faster in Matlab than Numpy.

 The following code takes less than 0.9sec in Matlab, but 21sec in Python.
 Numpy is 24 times slower than Matlab !
 The big trouble I have is a large team of people within my company is ready 
 to replace Matlab by Numpy/Scipy/Matplotlib,
 but I have to demonstrate that this kind of Python Code is executed with the 
 same performance than Matlab, without writing C extension.
 This is becoming a critical point for us.

 This is a testcase that people would like to see working without any code 
 restructuring.

Basically, if you want efficient numpy code, you have to use numpy
idioms. If you want to continue to use Matlab idioms, keep using
Matlab.

 The reasons are:
 - this way of writing is fairly natural.
 - the original code which showed me the matlab/Numpy performance differences 
 is much more complex,
 and can't benefit from broadcasting or other numpy tips (I can later give 
 this code)

Please do. Otherwise, we can't actually address your concerns.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Accumulate values that are below threshold

2009-01-07 Thread Stéfan van der Walt
Hi Bevan

Since the number of output elements are unknown, I don't think you can
implement this efficiently using arrays.  If your dataset isn't too
large, a for-loop should do the trick.  Otherwise, you may have to run
your code through Cython, which optimises for-loops around Python
lists.

thresh = 1.0
carry = 0
output = []
for idx, val in data:
carry += val
if (carry - thresh) = -1e-15:
output.append((idx, carry))
carry = 0

The comparison line above, (carry - thresh0 = -1e-15, may look
strange -- it basically just does carry = thresh.  For some reason
I don't quite understand, when accumulating floats, it sometimes
happens that 1.0 != 1.0, so I use 1e-15 as protection.

Regards
Stéfan

2009/1/8 Bevan Jenkins beva...@gmail.com:
 Hello,

 Sometimes the hardest part of a problem is articulating it.  Hopefully I can
 describe what I am trying to do - at least enough to get some help.

 I am trying to compare values to a threshold and when the values are lower 
 than
 the threshold they are added to the value in my set until the threshold is
 reached.  Everytime the threshold is reached I want the index and value
 (accumulated).

 Hopefully the example below will help

 threshold =1.0
 for indx,val in enumerate(Q):
print indx,val

 0 100.0
 1 20.0
 2 16.0
 3 7.0
 4 3.0
 5 1.5
 6 0.8
 7 0.6
 8 0.5
 9 0.2
 10 0.2
 11 0.1
 12 0.1

 The output I would like is (number of elements and value)
 0 100.0
 1 20.0
 2 16.0
 3 7.0
 4 3.0
 5 1.5
 7 1.4
 11 1.0


 The 1st 6 elements are easy as they are all greater than or equal to the
 threshold(1.0).  Once the values drop below the threshold the next value is
 added until the threshold is reached.


 Any help is appreciated,
 Bevan Jenkins
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion