Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-09 Thread Nicolas ROUX
Hi !

Thanks a lot for your fast/detailed reply.
A very good point for Numpy ;-)

I spent all my time trying to prepare my testcase to better share with you,
that's why I didn't reply fast.

I understand the weakness of the missing JITcompiler in Python vs Matlab,
that's why I invistigated numpy vectorization/broadcast.
(hoping to find a cool way to write our code in fast Numpy)

I used the page http://www.scipy.org/PerformancePython to write my code
efficiently in Numpy.
While doing it I found one issue.

To have pretty code, I created p0 and p1 arrays of indexes.
In test8 I wished to see the commented line working, which is not the
case. 
Having to use ix_ is not pretty enough, and seems to not work with further
dimensions.
Why the comment line is not working ?


def test8():
m = 1024
n = 512

Out = numpy.zeros((m,n))
In = numpy.zeros((m,n))
  
p0 = numpy.ogrid[0:m]
p1 = numpy.ogrid[0:n]

Out[0:m,0:n] = In[0:m,0:n]
#Out[p0,p1] = In[p0,p1]   #This doesn't work
Out[numpy.ix_(p0,p1)] = In[numpy.ix_(p0,p1)]


What is maybe not clear in the above code, is that I don't want to predefine
all possible ogrid/vector.
The number of possible ogrid/vector is big if in need to define all.
... And this vector definition become more paintful.

So Numpy vector style is fine if i can write something like:
   Out[p0,p1] = In[p0,p1]   #2 dimensions case
And 
   Out[p0,p1,1] = In[p0,p1,1] #3 dimensions case
But is not fine if i have to add .ix_() or to multiply the number of
vector definitions.

Below example with 3 dimensions instead of 2.

def test9():
m = 1024
n = 512

Out = numpy.zeros((m,n,3))
In = numpy.zeros((m,n,3))
  
p0 = numpy.ogrid[0:m]
p1 = numpy.ogrid[0:n]

Out[0:m,0:n,2] = In[0:m,0:n,2]
#Out[p0,p1,2] = In[p0,p1,2]
Out[numpy.ix_(p0,p1,2)] = In[numpy.ix_(p0,p1,2)]


Tanks again for your support ;-)

Cheers,
Nicolas.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-09 Thread Nicolas ROUX
Sorry my previous mail was probalby not clear.
This mail was following the tread we had before, so with some discussion
legacy.

I simplified the code to focus only on what I need, rather to bother you
with the full code.

I wrote below a code closer to what I need, where you will agree that
vectorization/broadcasting is needed to avoid nested loops.
As I wrote in the 1st mail (added at the end), what is important is to keep
the code not too ugly due to vectorization syntax.
(As written below I try to demonstrate that vectorization/broadcast code
could be as readable as twice nested loop )
The real code we have is even more complex, with processing the array
element using 5x5 neighbours, instead of 3x3.

##
def test6():
w = 3096
h = 2048
a = numpy.zeros((h,w))  #Normally loaded with real data
b = numpy.zeros((h,w,3))

w0 = numpy.ogrid[0:w-2]
w1 = numpy.ogrid[1:w-1]
w2 = numpy.ogrid[2:w]

h0 = numpy.ogrid[0:h-2]
h1 = numpy.ogrid[1:h-1]
h2 = numpy.ogrid[2:h]

p00, p10, p20 = [h0,w0], [h1,w0],[h2,w0]
p01, p11, p21 = [h0,w1], [h1,w1],[h2,w1]
p02, p12, p22 = [h0,w2], [h1,w2],[h2,w2] 

b[p11,1] =  a[p11] + 1.23*a[p22]  \
 - numpy.min([a[p11]-a[p00],
  a[p11]-a[p01],
  a[p11]-a[p02],
  a[p11]-a[p10],
  a[p11]-a[p12],
  a[p11]-a[p20],
  a[p11]-a[p21],
  a[p11]-a[p22]])  \
   + 0.123*numpy.max([a[p11]-a[p00],
  a[p11]-a[p01],
  a[p11]-a[p02],
  a[p11]-a[p10],
  a[p11]-a[p12],
  a[p11]-a[p20],
  a[p11]-a[p21],
  a[p11]-a[p22]])
##

This code above is the one I wish to write but is not working.
I hope you better understand my issue context ;-)

Did I missed something  ?

Thanks for your help.

Cheers,
Nicolas.


 I understand the weakness of the missing JITcompiler in Python vs Matlab,
 that's why I invistigated numpy vectorization/broadcast.
 (hoping to find a cool way to write our code in fast Numpy)

 I used the page http://www.scipy.org/PerformancePython to write my code
 efficiently in Numpy.
 While doing it I found one issue.

 To have pretty code, I created p0 and p1 arrays of indexes.

I must admit I don't quite understand what you are trying to do, and what
your problem is.

If you just want to do

Out[:,:] = In[:,:]

there is no need for meshgrids (ogrid), for-loops, or whatever.

It is brain dead to use nested for-loops or ogrid for this purpose in
NumPy. It is equally foolish to use nested for loops or meshgrid for this
purpose in Matlab. If you do, I would seriously question your competence.

By the way, you can index ogrid with more than one dimension:

p = numpy.ogrid[:m, :n]
Out[p] = In[p]


===

===

===

Hi !

Thanks a lot for your fast/detailed reply.
A very good point for Numpy ;-)

I spent all my time trying to prepare my testcase to better share with you,
that's why I didn't reply fast.

I understand the weakness of the missing JITcompiler in Python vs Matlab,
that's why I invistigated numpy vectorization/broadcast.
(hoping to find a cool way to write our code in fast Numpy)

I used the page http://www.scipy.org/PerformancePython to write my code
efficiently in Numpy.
While doing it I found one issue.

To have pretty code, I created p0 and p1 arrays of indexes.
In test8 I wished to see the commented line working, which is not the
case. 
Having to use ix_ is not pretty enough, and seems to not work with further
dimensions.
Why the comment line is not working ?


def test8():
m = 1024
n = 512

Out = numpy.zeros((m,n))
In = numpy.zeros((m,n))
  
p0 = numpy.ogrid[0:m]
p1 = numpy.ogrid[0:n]

Out[0:m,0:n] = In[0:m,0:n]
#Out[p0,p1] = In[p0,p1]   #This doesn't work
Out[numpy.ix_(p0,p1)] = In[numpy.ix_(p0,p1)]


What is maybe not clear in the above code, is that I don't want to predefine
all possible ogrid/vector.
The number of possible ogrid/vector is big if in need to define all.
... And this vector definition become more paintful.

So Numpy vector style is fine if i can write something like:
   Out[p0,p1] = In[p0,p1]   #2 dimensions case
And 

Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-09 Thread Sturla Molden

 I simplified the code to focus only on what I need, rather to bother you
 with the full code.

def test():
w = 3096
h = 2048
a = numpy.zeros((h,w), order='F')  #Normally loaded with real data
b = numpy.zeros((h,w,3), order='F')

w0 = slice(0,w-2)
w1 = slice(1,w-1)
w2 = slice(2,w)
h0 = slice(0,h-2)
h1 = slice(1,h-1)
h2 = slice(2,h)

p00, p10, p20 = [h0,w0], [h1,w0], [h2,w0]
p01, p11, p21 = [h0,w1], [h1,w1], [h2,w1]
p02, p12, p22 = [h0,w2], [h1,w2], [h2,w2]

b[p11 + [1]] =  a[p11] + 1.23*a[p22]  \
 - numpy.min([a[p11]-a[p00],
  a[p11]-a[p01],
  a[p11]-a[p02],
  a[p11]-a[p10],
  a[p11]-a[p12],
  a[p11]-a[p20],
  a[p11]-a[p21],
  a[p11]-a[p22]])  \
   + 0.123*numpy.max([a[p11]-a[p00],
  a[p11]-a[p01],
  a[p11]-a[p02],
  a[p11]-a[p10],
  a[p11]-a[p12],
  a[p11]-a[p20],
  a[p11]-a[p21],
  a[p11]-a[p22]])


Does this work for you?



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-09 Thread Nicolas ROUX
Thanks !

-1- The code style is good and the performance vs matlab is good.
With 400x400:
 Matlab = 1.56 sec  (with nested for loop, so no optimization)
 Numpy  = 0.99 sec  (with broadcasting)


-2- Now with the code below I have strange result.
With w=h=400:
   - Using slice=  0.99 sec
   - Using numpy.ogrid  =  0.01 sec

With w=400 and h=300:
   - Using slice,   = 0.719sec
   - Using numpy.ogrid, = broadcast ERROR !
 
The last broadcast error is:
 ValueError: shape mismatch: objects cannot be broadcast to a single shape

###
def test():
w = 400
 
if 1:  #---Case with different w and h
h = 300
else:  #---Case with same w and h
h = 400

a = numpy.zeros((h,w))  #Normally loaded with real data
b = numpy.zeros((h,w,3))

if 1:   #---Case with SLICE
w0 = slice(0,w-2)
w1 = slice(1,w-1)
w2 = slice(2,w)
h0 = slice(0,h-2)
h1 = slice(1,h-1)
h2 = slice(2,h)
else:   #---Case with OGRID
w0 = numpy.ogrid[0:w-2]
w1 = numpy.ogrid[1:w-1]
w2 = numpy.ogrid[2:w]
h0 = numpy.ogrid[0:h-2]
h1 = numpy.ogrid[1:h-1]
h2 = numpy.ogrid[2:h]

p00, p01, p02 = [h0,w0], [h0,w1],[h0,w2]
p10, p11, p12 = [h1,w0], [h1,w1],[h1,w2]
p20, p21, p22 = [h2,w0], [h2,w1],[h2,w2] 

b[p11+[1]] =  a[p11] - numpy.min([a[p11]-a[p00],
  a[p11]-a[p01],
  a[p11]-a[p02],
  a[p11]-a[p10],
  a[p11]-a[p12],
  a[p11]-a[p20],
  a[p11]-a[p21],
  a[p11]-a[p22]])  \
   + 0.123*numpy.max([a[p11]-a[p00],
  a[p11]-a[p01],
  a[p11]-a[p02],
  a[p11]-a[p10],
  a[p11]-a[p12],
  a[p11]-a[p20],
  a[p11]-a[p21],
  a[p11]-a[p22]])
###

It seems ogrid got better performance, but broadcasting is not working any
more.
Do you think it's normal that broadcast is not possible using ogrid and
different w  h ?
Did I missed any row/colomn missmatch ???

Thanks.

Cheers,
Nicolas



-Original Message-
From: numpy-discussion-boun...@scipy.org
[mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Sturla Molden
Sent: Friday, January 09, 2009 4:47 PM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] Numpy performance vs Matlab.


 I simplified the code to focus only on what I need, rather to bother you
 with the full code.

def test():
w = 3096
h = 2048
a = numpy.zeros((h,w), order='F')  #Normally loaded with real data
b = numpy.zeros((h,w,3), order='F')

w0 = slice(0,w-2)
w1 = slice(1,w-1)
w2 = slice(2,w)
h0 = slice(0,h-2)
h1 = slice(1,h-1)
h2 = slice(2,h)

p00, p10, p20 = [h0,w0], [h1,w0], [h2,w0]
p01, p11, p21 = [h0,w1], [h1,w1], [h2,w1]
p02, p12, p22 = [h0,w2], [h1,w2], [h2,w2]

b[p11 + [1]] =  a[p11] + 1.23*a[p22]  \
 - numpy.min([a[p11]-a[p00],
  a[p11]-a[p01],
  a[p11]-a[p02],
  a[p11]-a[p10],
  a[p11]-a[p12],
  a[p11]-a[p20],
  a[p11]-a[p21],
  a[p11]-a[p22]])  \
   + 0.123*numpy.max([a[p11]-a[p00],
  a[p11]-a[p01],
  a[p11]-a[p02],
  a[p11]-a[p10],
  a[p11]-a[p12],
  a[p11]-a[p20],
  a[p11]-a[p21],
  a[p11]-a[p22]])


Does this work for you?



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-09 Thread Christopher Barker
Nicolas ROUX wrote:
 -2- Now with the code below I have strange result.
 With w=h=400:

 With w=400 and h=300:
- Using numpy.ogrid, = broadcast ERROR !
  
 The last broadcast error is:
  ValueError: shape mismatch: objects cannot be broadcast to a single shape

This is probably a broadcasting error, which means your result with w=h 
is probably wrong.

My advice:

Always test with non-square arrays, to make sure you are broadcasting as 
you expect.

Don't test with zeros or ones for your test data -- so you can look 
at the results and see if you are getting what you expect. I often use 
something like:

a = numpy.arange(w*h).reshape((h,w))

I also test with very small dimensions so that I can easily print the 
results of each step as I develop the code

With this approach you will probably figure out what's going on.

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-09 Thread Sturla Molden

 -2- Now with the code below I have strange result.
 With w=h=400:
- Using slice=  0.99 sec
- Using numpy.ogrid  =  0.01 sec

It is not equivalent. The ogrid version only uses diagonal elements, and
does less work.


 It seems ogrid got better performance, but broadcasting is not working
 any
 more.

Broadcasting is working, but not the way you think. Ogrid is not a faster
alternative to slicing.

You have the same in Matlab. You can index with a slice, an array of
indices, or an array of booleans. If you are going to use the second
alternative, the shape of the index arrays -- in each dimension -- must
equal that of the output. You cannot use a meshgrid with different
shaped arrays of x, y and z indices. NumPy is no different from Matlab
here.




___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-09 Thread Robert Kern
On Fri, Jan 9, 2009 at 11:32, Nicolas ROUX nicolas.r...@st.com wrote:
 Thanks !

 -1- The code style is good and the performance vs matlab is good.
 With 400x400:
  Matlab = 1.56 sec  (with nested for loop, so no optimization)
  Numpy  = 0.99 sec  (with broadcasting)


 -2- Now with the code below I have strange result.
 With w=h=400:
   - Using slice=  0.99 sec
   - Using numpy.ogrid  =  0.01 sec

 With w=400 and h=300:
   - Using slice,   = 0.719sec
   - Using numpy.ogrid, = broadcast ERROR !

 The last broadcast error is:
  ValueError: shape mismatch: objects cannot be broadcast to a single shape

 ###
 def test():
w = 400

if 1:  #---Case with different w and h
h = 300
else:  #---Case with same w and h
h = 400

a = numpy.zeros((h,w))  #Normally loaded with real data
b = numpy.zeros((h,w,3))

if 1:   #---Case with SLICE
w0 = slice(0,w-2)
w1 = slice(1,w-1)
w2 = slice(2,w)
h0 = slice(0,h-2)
h1 = slice(1,h-1)
h2 = slice(2,h)
else:   #---Case with OGRID
w0 = numpy.ogrid[0:w-2]
w1 = numpy.ogrid[1:w-1]
w2 = numpy.ogrid[2:w]
h0 = numpy.ogrid[0:h-2]
h1 = numpy.ogrid[1:h-1]
h2 = numpy.ogrid[2:h]

p00, p01, p02 = [h0,w0], [h0,w1],[h0,w2]
p10, p11, p12 = [h1,w0], [h1,w1],[h1,w2]
p20, p21, p22 = [h2,w0], [h2,w1],[h2,w2]

b[p11+[1]] =  a[p11] - numpy.min([a[p11]-a[p00],
  a[p11]-a[p01],
  a[p11]-a[p02],
  a[p11]-a[p10],
  a[p11]-a[p12],
  a[p11]-a[p20],
  a[p11]-a[p21],
  a[p11]-a[p22]])  \
   + 0.123*numpy.max([a[p11]-a[p00],
  a[p11]-a[p01],
  a[p11]-a[p02],
  a[p11]-a[p10],
  a[p11]-a[p12],
  a[p11]-a[p20],
  a[p11]-a[p21],
  a[p11]-a[p22]])
 ###

 It seems ogrid got better performance, but broadcasting is not working any
 more.
 Do you think it's normal that broadcast is not possible using ogrid and
 different w  h ?
 Did I missed any row/colomn missmatch ???

There are several things wrong. Please read this document for
information about how indexing works in numpy.

  http://docs.scipy.org/doc/numpy/user/basics.indexing.html

But basically, you want slices. Using ogrid correctly will be slower.
FWIW, ogrid with only one argument is fairly pointless. ogrid is
intended to be used with multiple dimensions. If you just need one
argument, use arange() or linspace(). With multiple arguments, ogrid
will align the arrays such that they can be broadcasted as you expect.
Lets take a look at some examples:

In [1]: from numpy import *

In [2]: ogrid[0:5]
Out[2]: array([0, 1, 2, 3, 4])

In [3]: ogrid[0:6]
Out[3]: array([0, 1, 2, 3, 4, 5])


Two 1D arrays. Now, if you follow the discussion in the indexing
document, you know that if I were to use these as index arrays, one
for each axis, the indexing mechanism will try to iterate over them in
parallel. Since they have incompatible shapes, this will fail.

Instead, if you put both arguments into ogrid:

In [4]: ogrid[0:5, 0:6]
Out[4]:
[array([[0],
   [1],
   [2],
   [3],
   [4]]),
 array([[0, 1, 2, 3, 4, 5]])]


We get the kind of arrays you need. These shapes are compatible,
through broadcasting, and together form the indices to select out the
part of the matrix you are interested in.

However, just using the slices on the matrix instead of passing the
slices through ogrid is faster.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-09 Thread Christopher Barker
Robert Kern wrote:
 Instead, if you put both arguments into ogrid:
 
 In [4]: ogrid[0:5, 0:6]
 Out[4]:
 [array([[0],
[1],
[2],
[3],
[4]]),
  array([[0, 1, 2, 3, 4, 5]])]
 
 We get the kind of arrays you need. These shapes are compatible,
 through broadcasting, and together form the indices to select out the
 part of the matrix you are interested in.
 
 However, just using the slices on the matrix instead of passing the
 slices through ogrid is faster.

So what is ogrid useful for?

Just curious...

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-09 Thread Robert Kern
On Fri, Jan 9, 2009 at 15:40, Christopher Barker chris.bar...@noaa.gov wrote:

 So what is ogrid useful for?

 Just curious...

Floating point grids.

x, y = ogrid[0:1:101j, 0:1:101j]

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-09 Thread Sturla Molden

 However, just using the slices on the matrix instead of passing the
 slices through ogrid is faster.

 So what is ogrid useful for?

For the same problems where you would use meshgrid in Matlab. That is
certain graphics problem for example; e.g. evaluating a surface z = f(x,y)
over a grid of x,y values.






___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-09 Thread Robert Kern
On Fri, Jan 9, 2009 at 16:04, Christopher Barker chris.bar...@noaa.gov wrote:
 Sturla Molden wrote:
 For the same problems where you would use meshgrid in Matlab.

 well, I used to use meshgrid a lot because MATLAB could not do
 broadcasting. Which is probably why the OP has been trying to use it.

 A note for the docs:

 The docs refer to ogrid and nd_grid, and as far as I can tell, they are
 the same thing, but it's confusing. actuall, as I look more, ogrid is a
 nd_grid with the sparse parameter set to True -- anyway, still confusing!

I'm not sure why. The docstring seems fairly clear about this.

 Also, what is the o for -- the mnemonic might be helpful

open. mgrid and ogrid are instantiations of the nd_grid class with
different parameters.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-09 Thread Sturla Molden
 Sturla Molden wrote:
 For the same problems where you would use meshgrid in Matlab.

 well, I used to use meshgrid a lot because MATLAB could not do
 broadcasting. Which is probably why the OP has been trying to use it.

mgrid and ogrid are both meshgrids, with ogrid having a sparse
representation (i.e. it uses less memory).

There was no need for broadcasting in the OP's code. It just seemed a bit
confused.

Regards,
Sturla Molden



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread David Cournapeau
Nicolas ROUX wrote:
 Hi,

 I need help ;-)
 I have here a testcase which works much faster in Matlab than Numpy.
  
 The following code takes less than 0.9sec in Matlab, but 21sec in Python.
 Numpy is 24 times slower than Matlab !
 The big trouble I have is a large team of people within my company is ready 
 to replace Matlab by Numpy/Scipy/Matplotlib,
 but I have to demonstrate that this kind of Python Code is executed with the 
 same performance than Matlab, without writing C extension.
 This is becoming a critical point for us.

 This is a testcase that people would like to see working without any code 
 restructuring.
 The reasons are:
 - this way of writing is fairly natural.
 - the original code which showed me the matlab/Numpy performance differences 
 is much more complex,
 and can't benefit from broadcasting or other numpy tips (I can later give 
 this code)

 ...So I really need to use the code below, without restructuring.

 Numpy/Python code:
 #
 import numpy
 import time

 print Start test \n 

 dim = 3000

 a = numpy.zeros((dim,dim,3))

 start = time.clock()

 for i in range(dim):
 for j in range(dim):
 a[i,j,0] = a[i,j,1]
 a[i,j,2] = a[i,j,0]
 a[i,j,1] = a[i,j,2]

 end = time.clock() - start

 print Test done,   %f sec % end
 #

 Matlab code:
 #
 'Start test'
 dim = 3000;
 tic;
 a =zeros(dim,dim,3);
 for i = 1:dim
 for j = 1:dim 
 a(i,j,1) = a(i,j,2);
 a(i,j,2) = a(i,j,1);
 a(i,j,3) = a(i,j,3);
 end
 end
 toc
 'Test done'
 #

 Any idea on it ?
 Did I missed something ?
   

I think on recent versions of matlab, there is nothing you can do
without modifying the code: matlab has some JIT compilation for loops,
which is supposed to speed up those cases - at least, that's what is
claimed by matlab. The above loops are typical examples where this
should work reasonably well I believe:

http://www.mathworks.com/access/helpdesk_r13/help/techdoc/matlab_prog/ch7_pe10.html

If you really have to use loops, then matlab will be faster. But maybe
you don't; can you show us a more typical example ?

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Ryan May
Nicolas ROUX wrote:
 Hi,
 
 I need help ;-)
 I have here a testcase which works much faster in Matlab than Numpy.
  
 The following code takes less than 0.9sec in Matlab, but 21sec in Python.
 Numpy is 24 times slower than Matlab !
 The big trouble I have is a large team of people within my company is ready 
 to replace Matlab by Numpy/Scipy/Matplotlib,
 but I have to demonstrate that this kind of Python Code is executed with the 
 same performance than Matlab, without writing C extension.
 This is becoming a critical point for us.
 
 This is a testcase that people would like to see working without any code 
 restructuring.
 The reasons are:
 - this way of writing is fairly natural.
 - the original code which showed me the matlab/Numpy performance differences 
 is much more complex,
 and can't benefit from broadcasting or other numpy tips (I can later give 
 this code)
 
 ...So I really need to use the code below, without restructuring.
 
 Numpy/Python code:
 #
 import numpy
 import time
 
 print Start test \n 
 
 dim = 3000
 
 a = numpy.zeros((dim,dim,3))
 
 start = time.clock()
 
 for i in range(dim):
 for j in range(dim):
 a[i,j,0] = a[i,j,1]
 a[i,j,2] = a[i,j,0]
 a[i,j,1] = a[i,j,2]
 
 end = time.clock() - start
 
 print Test done,   %f sec % end
 #
SNIP
 Any idea on it ?
 Did I missed something ?

I think you may have reduced the complexity a bit too much.  The python code
above sets all of the elements equal to a[i,j,1].  Is there any reason you can't
use slicing to avoid the loops?

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Matthieu Brucher
 for i in range(dim):
for j in range(dim):
a[i,j,0] = a[i,j,1]
a[i,j,2] = a[i,j,0]
a[i,j,1] = a[i,j,2]

 for i = 1:dim
for j = 1:dim
a(i,j,1) = a(i,j,2);
a(i,j,2) = a(i,j,1);
a(i,j,3) = a(i,j,3);
end
 end

Hi,

The two loops are not the same.
As David stated, with JIT, the loops may be vectorized by Matlab on the fly.

-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Grissiom
On Wed, Jan 7, 2009 at 23:44, Ryan May rma...@gmail.com wrote:

 Nicolas ROUX wrote:
  Hi,
 
  I need help ;-)
  I have here a testcase which works much faster in Matlab than Numpy.
 
  The following code takes less than 0.9sec in Matlab, but 21sec in Python.
  Numpy is 24 times slower than Matlab !
  The big trouble I have is a large team of people within my company is
 ready to replace Matlab by Numpy/Scipy/Matplotlib,
  but I have to demonstrate that this kind of Python Code is executed with
 the same performance than Matlab, without writing C extension.
  This is becoming a critical point for us.
 
  This is a testcase that people would like to see working without any code
 restructuring.
  The reasons are:
  - this way of writing is fairly natural.
  - the original code which showed me the matlab/Numpy performance
 differences is much more complex,
  and can't benefit from broadcasting or other numpy tips (I can later give
 this code)
 
  ...So I really need to use the code below, without restructuring.
 
  Numpy/Python code:
  #
  import numpy
  import time
 
  print Start test \n
 
  dim = 3000
 
  a = numpy.zeros((dim,dim,3))
 
  start = time.clock()
 
  for i in range(dim):
  for j in range(dim):
  a[i,j,0] = a[i,j,1]
  a[i,j,2] = a[i,j,0]
  a[i,j,1] = a[i,j,2]
 
  end = time.clock() - start
 
  print Test done,   %f sec % end
  #
 SNIP
  Any idea on it ?
  Did I missed something ?

 I think you may have reduced the complexity a bit too much.  The python
 code
 above sets all of the elements equal to a[i,j,1].  Is there any reason you
 can't
 use slicing to avoid the loops?


Yes, I think so. I think the testcase  is a matter of python loop vs matlab
loop rather than python vs matlab.

-- 
Cheers,
Grissiom
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread josef . pktd
On Wed, Jan 7, 2009 at 10:58 AM, Grissiom chaos.pro...@gmail.com wrote:
 On Wed, Jan 7, 2009 at 23:44, Ryan May rma...@gmail.com wrote:

 Nicolas ROUX wrote:
  Hi,
 
  I need help ;-)
  I have here a testcase which works much faster in Matlab than Numpy.
 
  The following code takes less than 0.9sec in Matlab, but 21sec in
  Python.
  Numpy is 24 times slower than Matlab !
  The big trouble I have is a large team of people within my company is
  ready to replace Matlab by Numpy/Scipy/Matplotlib,
  but I have to demonstrate that this kind of Python Code is executed with
  the same performance than Matlab, without writing C extension.
  This is becoming a critical point for us.
 
  This is a testcase that people would like to see working without any
  code restructuring.
  The reasons are:
  - this way of writing is fairly natural.
  - the original code which showed me the matlab/Numpy performance
  differences is much more complex,
  and can't benefit from broadcasting or other numpy tips (I can later
  give this code)
 
  ...So I really need to use the code below, without restructuring.
 
  Numpy/Python code:
  #
  import numpy
  import time
 
  print Start test \n
 
  dim = 3000
 
  a = numpy.zeros((dim,dim,3))
 
  start = time.clock()
 
  for i in range(dim):
  for j in range(dim):
  a[i,j,0] = a[i,j,1]
  a[i,j,2] = a[i,j,0]
  a[i,j,1] = a[i,j,2]
 
  end = time.clock() - start
 
  print Test done,   %f sec % end
  #
 SNIP
  Any idea on it ?
  Did I missed something ?

 I think you may have reduced the complexity a bit too much.  The python
 code
 above sets all of the elements equal to a[i,j,1].  Is there any reason you
 can't
 use slicing to avoid the loops?


 Yes, I think so. I think the testcase  is a matter of python loop vs matlab
 loop rather than python vs matlab.

 --
 Cheers,
 Grissiom

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion



I tried with matlab 2006a, I don't know if there is JIT, but the main
speed difference comes with the numpy array access.

The test is actually biased in favor of python, since in the matlab
code the initialization with zeros is inside the time count, but
outside in the python version

If I just put b=1.0 inside the double loop (no numpy)

Python  1.453644 sec
matlab  0.335249 seconds, with zeros outside loop:  0.060582 seconds

with original array assignment:

python/numpy 32.745030 sec
matlab 1.633415 seconds, with zeros outside loop: 1.251597 seconds

(putting the loop in a function and using psyco reduces speed by 30%)


Josef
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread josef . pktd
A test case closer to my applications is calling functions in loops:

Python
---
def assgn(a,i,j):
a[i,j,0] = a[i,j,1] + 1.0
a[i,j,2] = a[i,j,0]
a[i,j,1] = a[i,j,2]
return a

print Start test \n

dim = 300#0

a = numpy.zeros((dim,dim,3))
start = time.clock()

for i in range(dim):
for j in range(dim):
assgn(a,i,j)

end = time.clock() - start
assert numpy.max(a)==1.0  #added to check inplace substitution
print Test done,   %f sec % end
---

matlab:
--
function a = tryloopspeed()

'Start test'
dim = 300;
a = zeros(dim,dim,3);
tic;
for i = 1:dim
   for j = 1:dim
a = assgn(a,i,j);
   end
end
toc
'Test done'
end
function a = assgn(a,i,j)
 a(i,j,1) = a(i,j,2);
 a(i,j,2) = a(i,j,1);
 a(i,j,3) = a(i,j,3);
end
---

Note: I had to reduce the size of the matrix because I got impatient
waiting for matlab
time:

python: Test done,   0.486127 sec
matlab:
 output = tryloopspeed();
ans =
Start test
Elapsed time is 511.815971 seconds.
ans =
Test done
 511.815971/60.0  #minutes
ans =
  8.530

matlab takes 1053 times the time of python
The problem is that at least in my version of matlab, it copies
function arguments when they are modified. It's possible to work
around this, but not very clean.

So for simple loops python looses, but for other things, python wins
by a huge margin. Unless somebody can spot a mistake in my timing.

Josef
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Christopher Barker
 Nicolas ROUX wrote:
 The big trouble I have is a large team of people within my company is ready 
 to replace Matlab by Numpy/Scipy/Matplotlib,

we like that!

 This is a testcase that people would like to see working without any code 
 restructuring.
 The reasons are:
 - this way of writing is fairly natural.

Only if you haven't wrapped your brain around array-oriented 
programming! (see below)

 - the original code which showed me the matlab/Numpy performance differences 
 is much more complex,
 and can't benefit from broadcasting or other numpy tips (I can later give 
 this code)

so you're asking: how can I make this code faster without changing it? 
The only way to do that is to change python or numpy, and while it might 
be nice to do that to improve performance in this type of case, it's a 
tall order!

It's really not a good goal, anyway -- python/numpy is by no means a 
drop-in replacement for MATLAB -- they are very different beasts. 
Personally, I think most of the differences favor Python, but if you try 
to write python the same way you'd write MATLAB, you'll lose most of the 
benefits -- you might as well stick with MATLAB.

However, in this case, MATLAB was traditionally slow with loops and 
indexing and needed to be vectorized for decent performance as well.

It look like they now have a nice JIT compiler for this sort of thing -- 
to get a similar effect in numpy, you'll need to use weave or Cython or 
something, notable not as easy as having the interpreter just do it for you.

I'd love to see a numpy-aware psyco some day, an maybe the new buffer 
interface will facilitate that, but it's inherently harder with numpy -- 
MATLAB at least used to be limited to 2-d arrays of doubles, so far less 
special casing to be done.

Even with this nifty JIT, I think Python has many advantages -- if your 
code is well written, there will be a only a few places with these sorts 
of performance bottlenecks, and weave or Cython, or SWIG, or Ctypes, or 
f2py can all give you a good solution.

One other thought -- could numexp help here?


About array-oriented programming:

All lot of folks seem to think that the only reason to vectorize code 
in MATLAB, numpy, etc, is for better performance. If MATLAB now has a 
good JIT, then there is no point -- I think that's a mistake. If you 
write your code to work with arrays of data, you get more compact, less 
bug-prone code than if you are working with indexed elements all the 
time. I also think the code is clearer most of the time. I say most, 
because sometimes you do need to do tricks to vectorize that can 
obfuscate the code.

I understand that this may be a simplified example, and the real 
use-case could be quite different. However:

 a = numpy.zeros((dim,dim,3))

so we essentially have three square arrays stacked together -- what do 
they represent? that might help guide you, but without that, I can still 
see:

 for i in range(dim):
 for j in range(dim):

this really means -- for every element of the 2-d arrays, which can be 
written as: a[:,:]

 a[i,j,0] = a[i,j,1]
 a[i,j,2] = a[i,j,0]
 a[i,j,1] = a[i,j,2]

and this is simply swapping the three around. So, if you start out 
thinking in terms of a set of 2-d arrays, rather than a huge pile of 
elements, the code you will arrive at is more like:

a[:,:,0] = a[:,:,1]
a[:,:,2] = a[:,:,0]
a[:,:,1] = a[:,:,2]

With no loops:

or you could give them names:

a0 = a[:,:,0]
a1 = a[:,:,1]
a2 = a[:,:,2]

then:

a0[:] = a1
a2[:] = a0
a1[:] = a2

which, of course, is really:

a[:,:,:] = a1.reshape((dim,dim,1))

but I suspect that that's the result of a typo.

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Christopher Barker
josef.p...@gmail.com wrote:
 So for simple loops python looses, but for other things, python wins
 by a huge margin.

which emphasizes the point that you can't write code the same way in the 
two languages, though I'd argue that that code needs refactoring in any 
language!

However, numpy's reference semantics is definitely a strong advantage of 
MATLAB -- more flexibility in general.

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Xavier Gnata
Well it is the best pitch for numpy versus matlab I have read so far :)
(and I 100% agree)

Xavier
 On 1/7/2009 4:16 PM, David Cournapeau wrote:

   
 I think on recent versions of matlab, there is nothing you can do
 without modifying the code: matlab has some JIT compilation for loops,
 which is supposed to speed up those cases - at least, that's what is
 claimed by matlab. 
 

 Yes it does. After using both for more than 10 years, my impression is this:

 - Matlab slicing creates new arrays. NumPy slicing creates views. NumPy 
 is faster and more memory efficient.

 - Matlab JIT compiles loops. NumPy does not. Matlab is faster for stupid 
 programmers that don't know how use slices. But neither Matlab nor 
 Python/NumPy is meant to be used like Java.

 - Python has psyco. It is about as good as Matlab's JIT. But psyco has 
 no knowledge of NumPy ndarrays.

 - Using Cython is easier than writing Matlab MEX files.

 - Python has better support for data structures, better built-in 
 structures (tuple, lists, dics, sets), and general purpose libraries. 
 Matlab has extensive numerical toolboxes that you can buy.

 - Matlab pass function arguments by value (albeit COW optimized). Python 
 pass references. This makes NumPy more efficient if you need to pass 
 large arrays or array slices.

 - Matlab tends to fragment the heap (hence the pack command). 
 Python/NumPy does not. This makes long-running processes notoriously 
 unstable on Matlab.

 - Matlab has some numerical libraries that are better.

 - I like the Matlab command prompt and IDE better. But its not enough to 
 make me want to use it.

 - Python is a proper programming language. Matlab is a numerical 
 scripting language - good for small scripts but not complex software 
 systems.


 Sturla Molden







 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion
   

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Sturla Molden
On 1/7/2009 6:56 PM, Christopher Barker wrote:

 So for simple loops python looses, but for other things, python wins
 by a huge margin.
 
 which emphasizes the point that you can't write code the same way in the 
 two languages, though I'd argue that that code needs refactoring in any 
 language!

Roux example would be bad in either language. Slices ('vectorization' in 
Matlab lingo) is preferred in both cases. It's just that neither Matlab 
nor Python/NumPy was designed to be used like Java. For loops should not 
be abused in Python nor in Matlab (but Matlab is more forgiving now than 
it used to be).


Sturla Molden
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Sturla Molden
On 1/7/2009 6:51 PM, Christopher Barker wrote:

 Even with this nifty JIT, 

It is not a very nifty JIT. It can transform some simple loops into 
vectorized expressions. And it removes the overhead from indexing with 
doubles.

But if you are among those that do

n = length(x)
m = 0
for i = 1.0 : n
   m = m + x(i)
end
m = m / n

instead of

m = mean(x)

it will be nifty enough.


 All lot of folks seem to think that the only reason to vectorize code 
 in MATLAB, numpy, etc, is for better performance. If MATLAB now has a 
 good JIT, then there is no point -- I think that's a mistake. 

Fortran 90/95 has array slicing as well.


Sturla Molden






___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread josef . pktd
On Wed, Jan 7, 2009 at 1:32 PM, Sturla Molden stu...@molden.no wrote:
 On 1/7/2009 6:56 PM, Christopher Barker wrote:

 So for simple loops python looses, but for other things, python wins
 by a huge margin.

 which emphasizes the point that you can't write code the same way in the
 two languages, though I'd argue that that code needs refactoring in any
 language!

 Roux example would be bad in either language. Slices ('vectorization' in
 Matlab lingo) is preferred in both cases. It's just that neither Matlab
 nor Python/NumPy was designed to be used like Java. For loops should not
 be abused in Python nor in Matlab (but Matlab is more forgiving now than
 it used to be).


 Sturla Molden
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


I'm missing name spaces in matlab. everything is
from path import *

and it's more difficult to keep are larger project organized in matlab
than in python.

But, I think,
matlab is ahead in parallelization (which I haven't used much)
and learning matlab is easier than numpy. (dtypes and broadcasting are
more restrictive in matlab but, for a beginner, easier to figure out)

Josef
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Sturla Molden
On 1/7/2009 7:52 PM, josef.p...@gmail.com wrote:

 But, I think,
 matlab is ahead in parallelization (which I haven't used much)

Not really. There is e.g. nothing like Python's multiprocessing package 
in Matlab. Matlab is genrally single-threaded. Python is multi-threaded 
but there is a GIL. And having multiple Matlab processes running 
simultaneously consumes a lot of resources. Python is far better in this 
respect. Don't confuse vectorization with parallelization. It is not the 
same. If you are going to do real parallelization, you are better off 
using Python with multiprocessing or mpi4py.


 and learning matlab is easier than numpy. (dtypes and broadcasting are
 more restrictive in matlab but, for a beginner, easier to figure out)

The available data types is about the same, at least last time I 
checked. (I am not thinking about Python built-ins here, but NumPy dtypes.)

Matlab does not have broadcasting. Array shapes must always match.


S.M.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Robert Kern
On Wed, Jan 7, 2009 at 10:19, Nicolas ROUX nicolas.r...@st.com wrote:
 Hi,

 I need help ;-)
 I have here a testcase which works much faster in Matlab than Numpy.

 The following code takes less than 0.9sec in Matlab, but 21sec in Python.
 Numpy is 24 times slower than Matlab !
 The big trouble I have is a large team of people within my company is ready 
 to replace Matlab by Numpy/Scipy/Matplotlib,
 but I have to demonstrate that this kind of Python Code is executed with the 
 same performance than Matlab, without writing C extension.
 This is becoming a critical point for us.

 This is a testcase that people would like to see working without any code 
 restructuring.

Basically, if you want efficient numpy code, you have to use numpy
idioms. If you want to continue to use Matlab idioms, keep using
Matlab.

 The reasons are:
 - this way of writing is fairly natural.
 - the original code which showed me the matlab/Numpy performance differences 
 is much more complex,
 and can't benefit from broadcasting or other numpy tips (I can later give 
 this code)

Please do. Otherwise, we can't actually address your concerns.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion