Re: [Numpy-discussion] [Newbie] Fast plotting
Hi all, First, let me say that I'm impressed: this mailing list is probably the most reactive I've ever seen. I've asked my first question and got immediately more solutions than time to test them... Many thanks to all the answerers. Using the various proposals, I ran two performance tests: - test 1: 200 random values - test 2: 1328724 values from my real use case Here are the various functions and how they perform: def f0 (x, y) : Initial version test 1 CPU times: 13.37s test 2 CPU times: 5.92s s, n = {}, {} for a, b in zip(x, y) : s[a] = s.get(a, 0.0) + b n[a] = n.get(a, 0) + 1 return (numpy.array([a for a in sorted(s)]), numpy.array([s[a]/n[a] for a in sorted(s)])) def f1 (x, y) : Alan G Isaac ais...@american.edu Modified in order to sort the result only once. test 1 CPU times: 10.86s test 2 CPU times: 2.78s defaultdict indeed speeds things up, probably avoiding one of two sorts is good also s, n = defaultdict(float), defaultdict(int) for a, b in izip(x, y) : s[a] += b n[a] += 1 new_x = numpy.array([a for a in sorted(s)]) return (new_x, numpy.array([s[a]/n[a] for a in new_x])) def f2 (x, y) : Francesc Alted fal...@pytables.org Modified with preallocation of arrays (it appeared faster) test 1: killed after more than 10 minutes test 2 CPU times: 22.01s This result is not surprising as I guess a quadratic complexity: one pass for each unique value in x, and presumably one nested pass to compute y[x==i] u = numpy.unique(x) m = numpy.array(range(len(u))) for pos, i in enumerate(u) : g = y[x == i] m[pos] = g.mean() return u, m def f3 (x, y) : Sebastian Stephan Berg sebast...@sipsolutions.net Modified because I can always work in place. test 1 CPU times: 17.43s test 2 CPU times: 0.21s Adopted! This is definitely the fastest one when using real values. I tried to preallocate arrays by setting u=numpy.unique(x) and the looping on u, but the result is slower, probably because of unique() Compared with f1, its slower on larger arrays of random values. It may be explained by a complexity argument: f1 as a linear complexity (two passes in sequence) while f3 is probably N log N (a sequence of one sort, two passes to set x[:] and y[:] and one loop on each distinct value with a nested searchsorted that is probably logarithmic). But, real values are far from random, and the sort is probably more efficient, as well as the while loop is shorter because there are less values. s = x.argsort() x[:] = x[s] y[:] = y[s] u, means, start, value = [], [], 0, x[0] while True: next = x.searchsorted(value, side='right') u.append(value) means.append(y[start:next].mean()) if next == len(x): break value = x[next] start = next return numpy.array(u), numpy.array(means) def f4 (x, y) : Jean-Baptiste Rudant boogalo...@yahoo.fr test 1 CPU times: 111.21s test 2 CPU times: 13.48s As Jean-Baptiste noticed, this solution is not very efficient (but works almost of-the-shelf). recXY = numpy.rec.fromarrays((x, x), names='x, y') return matplotlib.mlab.rec_groupby(recXY, ('x',), (('y', numpy.mean, 'y_avg'),)) A few more remarks. Sebastian Stephan Berg wrote: Just thinking. If the parameters are limited, you may be able to use the histogram feature? Doing one histogram with Y as weights, then one without weights and calculating the mean from this yourself should be pretty speedy I imagine. I'm afraid I don't know what the histogram function computes. But this may be something worth to investigate because I think I'll need it later on in order to smooth my graphs (by plotting mean values on intervals). Bruce Southey wrote: If you use Knuth's one pass approach (http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#III._On-line_algorithm) you can write a function to get the min, max, mean and variance/standard deviation in a single pass through the array rather than one pass for each. I do not know if this will provide any advantage as that will probably depend on the size of the arrays. If I understood well, this algorithm computes the variance of a whole array, I can see how to adapt it to compute mean (already done by the algorithm), max, min, etc., but I did not see how it can be adapted to my case. Also, please use the highest precision possible (ie float128) for your arrays to minimize numerical error due to the size of your arrays. Thanks for the advice! So, thank you again everybody. Cheers, Franck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Numpy performance vs Matlab.
Hi, I need help ;-) I have here a testcase which works much faster in Matlab than Numpy. The following code takes less than 0.9sec in Matlab, but 21sec in Python. Numpy is 24 times slower than Matlab ! The big trouble I have is a large team of people within my company is ready to replace Matlab by Numpy/Scipy/Matplotlib, but I have to demonstrate that this kind of Python Code is executed with the same performance than Matlab, without writing C extension. This is becoming a critical point for us. This is a testcase that people would like to see working without any code restructuring. The reasons are: - this way of writing is fairly natural. - the original code which showed me the matlab/Numpy performance differences is much more complex, and can't benefit from broadcasting or other numpy tips (I can later give this code) ...So I really need to use the code below, without restructuring. Numpy/Python code: # import numpy import time print Start test \n dim = 3000 a = numpy.zeros((dim,dim,3)) start = time.clock() for i in range(dim): for j in range(dim): a[i,j,0] = a[i,j,1] a[i,j,2] = a[i,j,0] a[i,j,1] = a[i,j,2] end = time.clock() - start print Test done, %f sec % end # Matlab code: # 'Start test' dim = 3000; tic; a =zeros(dim,dim,3); for i = 1:dim for j = 1:dim a(i,j,1) = a(i,j,2); a(i,j,2) = a(i,j,1); a(i,j,3) = a(i,j,3); end end toc 'Test done' # Any idea on it ? Did I missed something ? Thanks a lot, in advance for your help. Cheers, Nicolas. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy performance vs Matlab.
Nicolas ROUX wrote: Hi, I need help ;-) I have here a testcase which works much faster in Matlab than Numpy. The following code takes less than 0.9sec in Matlab, but 21sec in Python. Numpy is 24 times slower than Matlab ! The big trouble I have is a large team of people within my company is ready to replace Matlab by Numpy/Scipy/Matplotlib, but I have to demonstrate that this kind of Python Code is executed with the same performance than Matlab, without writing C extension. This is becoming a critical point for us. This is a testcase that people would like to see working without any code restructuring. The reasons are: - this way of writing is fairly natural. - the original code which showed me the matlab/Numpy performance differences is much more complex, and can't benefit from broadcasting or other numpy tips (I can later give this code) ...So I really need to use the code below, without restructuring. Numpy/Python code: # import numpy import time print Start test \n dim = 3000 a = numpy.zeros((dim,dim,3)) start = time.clock() for i in range(dim): for j in range(dim): a[i,j,0] = a[i,j,1] a[i,j,2] = a[i,j,0] a[i,j,1] = a[i,j,2] end = time.clock() - start print Test done, %f sec % end # Matlab code: # 'Start test' dim = 3000; tic; a =zeros(dim,dim,3); for i = 1:dim for j = 1:dim a(i,j,1) = a(i,j,2); a(i,j,2) = a(i,j,1); a(i,j,3) = a(i,j,3); end end toc 'Test done' # Any idea on it ? Did I missed something ? I think on recent versions of matlab, there is nothing you can do without modifying the code: matlab has some JIT compilation for loops, which is supposed to speed up those cases - at least, that's what is claimed by matlab. The above loops are typical examples where this should work reasonably well I believe: http://www.mathworks.com/access/helpdesk_r13/help/techdoc/matlab_prog/ch7_pe10.html If you really have to use loops, then matlab will be faster. But maybe you don't; can you show us a more typical example ? cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy performance vs Matlab.
Nicolas ROUX wrote: Hi, I need help ;-) I have here a testcase which works much faster in Matlab than Numpy. The following code takes less than 0.9sec in Matlab, but 21sec in Python. Numpy is 24 times slower than Matlab ! The big trouble I have is a large team of people within my company is ready to replace Matlab by Numpy/Scipy/Matplotlib, but I have to demonstrate that this kind of Python Code is executed with the same performance than Matlab, without writing C extension. This is becoming a critical point for us. This is a testcase that people would like to see working without any code restructuring. The reasons are: - this way of writing is fairly natural. - the original code which showed me the matlab/Numpy performance differences is much more complex, and can't benefit from broadcasting or other numpy tips (I can later give this code) ...So I really need to use the code below, without restructuring. Numpy/Python code: # import numpy import time print Start test \n dim = 3000 a = numpy.zeros((dim,dim,3)) start = time.clock() for i in range(dim): for j in range(dim): a[i,j,0] = a[i,j,1] a[i,j,2] = a[i,j,0] a[i,j,1] = a[i,j,2] end = time.clock() - start print Test done, %f sec % end # SNIP Any idea on it ? Did I missed something ? I think you may have reduced the complexity a bit too much. The python code above sets all of the elements equal to a[i,j,1]. Is there any reason you can't use slicing to avoid the loops? Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy performance vs Matlab.
for i in range(dim): for j in range(dim): a[i,j,0] = a[i,j,1] a[i,j,2] = a[i,j,0] a[i,j,1] = a[i,j,2] for i = 1:dim for j = 1:dim a(i,j,1) = a(i,j,2); a(i,j,2) = a(i,j,1); a(i,j,3) = a(i,j,3); end end Hi, The two loops are not the same. As David stated, with JIT, the loops may be vectorized by Matlab on the fly. -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy performance vs Matlab.
On Wed, Jan 7, 2009 at 23:44, Ryan May rma...@gmail.com wrote: Nicolas ROUX wrote: Hi, I need help ;-) I have here a testcase which works much faster in Matlab than Numpy. The following code takes less than 0.9sec in Matlab, but 21sec in Python. Numpy is 24 times slower than Matlab ! The big trouble I have is a large team of people within my company is ready to replace Matlab by Numpy/Scipy/Matplotlib, but I have to demonstrate that this kind of Python Code is executed with the same performance than Matlab, without writing C extension. This is becoming a critical point for us. This is a testcase that people would like to see working without any code restructuring. The reasons are: - this way of writing is fairly natural. - the original code which showed me the matlab/Numpy performance differences is much more complex, and can't benefit from broadcasting or other numpy tips (I can later give this code) ...So I really need to use the code below, without restructuring. Numpy/Python code: # import numpy import time print Start test \n dim = 3000 a = numpy.zeros((dim,dim,3)) start = time.clock() for i in range(dim): for j in range(dim): a[i,j,0] = a[i,j,1] a[i,j,2] = a[i,j,0] a[i,j,1] = a[i,j,2] end = time.clock() - start print Test done, %f sec % end # SNIP Any idea on it ? Did I missed something ? I think you may have reduced the complexity a bit too much. The python code above sets all of the elements equal to a[i,j,1]. Is there any reason you can't use slicing to avoid the loops? Yes, I think so. I think the testcase is a matter of python loop vs matlab loop rather than python vs matlab. -- Cheers, Grissiom ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [Newbie] Fast plotting
On Wed, Jan 7, 2009 at 6:37 AM, Franck Pommereau pommer...@univ-paris12.fr wrote: def f4 (x, y) : Jean-Baptiste Rudant boogalo...@yahoo.fr test 1 CPU times: 111.21s test 2 CPU times: 13.48s As Jean-Baptiste noticed, this solution is not very efficient (but works almost of-the-shelf). recXY = numpy.rec.fromarrays((x, x), names='x, y') return matplotlib.mlab.rec_groupby(recXY, ('x',), (('y', numpy.mean, 'y_avg'),)) This probably will have no impact on your tests, but this looks like a bug. You probably mean: recXY = numpy.rec.fromarrays((x, y), names='x, y') Could you post the code you use to generate you inputs (ie what is x?) I will look into trying some of the suggestions here to improve the performance on rec_groupby. One thing that slows it down is that it supports an arbitrary number of keys -- eg groupby ('year', 'month') -- whereas the examples above are using a single value lookup. JDH ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [Newbie] Fast plotting
This probably will have no impact on your tests, but this looks like a bug. You probably mean: recXY = numpy.rec.fromarrays((x, y), names='x, y') Sure! Thanks. Could you post the code you use to generate you inputs (ie what is x?) My code is probably not usable by somebody else than me. I'm presently too busy to clean it and add comments. But as soon as I'll be able to do so, I'll send you the usable version. Cheers, Franck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy performance vs Matlab.
On Wed, Jan 7, 2009 at 10:58 AM, Grissiom chaos.pro...@gmail.com wrote: On Wed, Jan 7, 2009 at 23:44, Ryan May rma...@gmail.com wrote: Nicolas ROUX wrote: Hi, I need help ;-) I have here a testcase which works much faster in Matlab than Numpy. The following code takes less than 0.9sec in Matlab, but 21sec in Python. Numpy is 24 times slower than Matlab ! The big trouble I have is a large team of people within my company is ready to replace Matlab by Numpy/Scipy/Matplotlib, but I have to demonstrate that this kind of Python Code is executed with the same performance than Matlab, without writing C extension. This is becoming a critical point for us. This is a testcase that people would like to see working without any code restructuring. The reasons are: - this way of writing is fairly natural. - the original code which showed me the matlab/Numpy performance differences is much more complex, and can't benefit from broadcasting or other numpy tips (I can later give this code) ...So I really need to use the code below, without restructuring. Numpy/Python code: # import numpy import time print Start test \n dim = 3000 a = numpy.zeros((dim,dim,3)) start = time.clock() for i in range(dim): for j in range(dim): a[i,j,0] = a[i,j,1] a[i,j,2] = a[i,j,0] a[i,j,1] = a[i,j,2] end = time.clock() - start print Test done, %f sec % end # SNIP Any idea on it ? Did I missed something ? I think you may have reduced the complexity a bit too much. The python code above sets all of the elements equal to a[i,j,1]. Is there any reason you can't use slicing to avoid the loops? Yes, I think so. I think the testcase is a matter of python loop vs matlab loop rather than python vs matlab. -- Cheers, Grissiom ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion I tried with matlab 2006a, I don't know if there is JIT, but the main speed difference comes with the numpy array access. The test is actually biased in favor of python, since in the matlab code the initialization with zeros is inside the time count, but outside in the python version If I just put b=1.0 inside the double loop (no numpy) Python 1.453644 sec matlab 0.335249 seconds, with zeros outside loop: 0.060582 seconds with original array assignment: python/numpy 32.745030 sec matlab 1.633415 seconds, with zeros outside loop: 1.251597 seconds (putting the loop in a function and using psyco reduces speed by 30%) Josef ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] help with typemapping a C function to use numpy arrays
Here is my example, trying to wrap the function sms_spectrumMag that we have been dealing with: %apply (int DIM1, float* IN_ARRAY1) {(int sizeInArray, float* pInArray)}; %apply (int DIM1, float* INPLACE_ARRAY1) {(int sizeOutArray, float* pOutArray)}; %inline %{ void my_spectrumMag( int sizeInArray, float *pInArray, int sizeOutArray, float *pOutArray) { sms_spectrumMag(sizeOutArray, pInArray, pOutArray); } %} at this point, have the new function my_spectrumMag that wraps sms_spectrumMag() and provides arguments that can be typemapped using numpy.i Now, I don't want to have to call the function my_spectrumMag() in python, I want to use the original name, I would like to call the function as: sms_spectrumMag(numpyArray1, numpyArray2) But, trying to %rename my_spectrumMag to sms_spectrumMag does not work, the original sms_spectrumMag gets called in python instead. Trying to %ignore the original function first as follows removes the sms_spectrumMag completely from the module and I am left with my_spectrumMag: %ignore sms_spectrumMag; %rename (sms_spectrumMag) my_spectrumMag; Do you see my problem? On Wed, Jan 7, 2009 at 8:58 AM, Matthieu Brucher matthieu.bruc...@gmail.com wrote: 2009/1/6 Rich E reakina...@gmail.com: This helped immensely. I feel like I am getting close to being able to accomplish what I would like with SWIG: producing a python module that can be very 'python-like', while co-existing with the c library that is very 'c-like'. There is one question still remaining though, is it possible to make the wrapped function have the same name still? Using either my_spectrumMag or spectrumMag means I have to create a number of inconsistencies between the python module and the c library. It is ideal to ignore (%ignore?) the c sms_spectrumMag and instead use the wrapped one, with the same name. But my attempts at doing this so far have not compiled because of name conflictions. Ok course you can. The function is renamed only if you say so. Perhaps can you provide a small example of what doesn't work at the moment ? Thanks for the help, I think you are doing great things with this numpy interface/typemaps system. Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy performance vs Matlab.
A test case closer to my applications is calling functions in loops: Python --- def assgn(a,i,j): a[i,j,0] = a[i,j,1] + 1.0 a[i,j,2] = a[i,j,0] a[i,j,1] = a[i,j,2] return a print Start test \n dim = 300#0 a = numpy.zeros((dim,dim,3)) start = time.clock() for i in range(dim): for j in range(dim): assgn(a,i,j) end = time.clock() - start assert numpy.max(a)==1.0 #added to check inplace substitution print Test done, %f sec % end --- matlab: -- function a = tryloopspeed() 'Start test' dim = 300; a = zeros(dim,dim,3); tic; for i = 1:dim for j = 1:dim a = assgn(a,i,j); end end toc 'Test done' end function a = assgn(a,i,j) a(i,j,1) = a(i,j,2); a(i,j,2) = a(i,j,1); a(i,j,3) = a(i,j,3); end --- Note: I had to reduce the size of the matrix because I got impatient waiting for matlab time: python: Test done, 0.486127 sec matlab: output = tryloopspeed(); ans = Start test Elapsed time is 511.815971 seconds. ans = Test done 511.815971/60.0 #minutes ans = 8.530 matlab takes 1053 times the time of python The problem is that at least in my version of matlab, it copies function arguments when they are modified. It's possible to work around this, but not very clean. So for simple loops python looses, but for other things, python wins by a huge margin. Unless somebody can spot a mistake in my timing. Josef ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy performance vs Matlab.
Nicolas ROUX wrote: The big trouble I have is a large team of people within my company is ready to replace Matlab by Numpy/Scipy/Matplotlib, we like that! This is a testcase that people would like to see working without any code restructuring. The reasons are: - this way of writing is fairly natural. Only if you haven't wrapped your brain around array-oriented programming! (see below) - the original code which showed me the matlab/Numpy performance differences is much more complex, and can't benefit from broadcasting or other numpy tips (I can later give this code) so you're asking: how can I make this code faster without changing it? The only way to do that is to change python or numpy, and while it might be nice to do that to improve performance in this type of case, it's a tall order! It's really not a good goal, anyway -- python/numpy is by no means a drop-in replacement for MATLAB -- they are very different beasts. Personally, I think most of the differences favor Python, but if you try to write python the same way you'd write MATLAB, you'll lose most of the benefits -- you might as well stick with MATLAB. However, in this case, MATLAB was traditionally slow with loops and indexing and needed to be vectorized for decent performance as well. It look like they now have a nice JIT compiler for this sort of thing -- to get a similar effect in numpy, you'll need to use weave or Cython or something, notable not as easy as having the interpreter just do it for you. I'd love to see a numpy-aware psyco some day, an maybe the new buffer interface will facilitate that, but it's inherently harder with numpy -- MATLAB at least used to be limited to 2-d arrays of doubles, so far less special casing to be done. Even with this nifty JIT, I think Python has many advantages -- if your code is well written, there will be a only a few places with these sorts of performance bottlenecks, and weave or Cython, or SWIG, or Ctypes, or f2py can all give you a good solution. One other thought -- could numexp help here? About array-oriented programming: All lot of folks seem to think that the only reason to vectorize code in MATLAB, numpy, etc, is for better performance. If MATLAB now has a good JIT, then there is no point -- I think that's a mistake. If you write your code to work with arrays of data, you get more compact, less bug-prone code than if you are working with indexed elements all the time. I also think the code is clearer most of the time. I say most, because sometimes you do need to do tricks to vectorize that can obfuscate the code. I understand that this may be a simplified example, and the real use-case could be quite different. However: a = numpy.zeros((dim,dim,3)) so we essentially have three square arrays stacked together -- what do they represent? that might help guide you, but without that, I can still see: for i in range(dim): for j in range(dim): this really means -- for every element of the 2-d arrays, which can be written as: a[:,:] a[i,j,0] = a[i,j,1] a[i,j,2] = a[i,j,0] a[i,j,1] = a[i,j,2] and this is simply swapping the three around. So, if you start out thinking in terms of a set of 2-d arrays, rather than a huge pile of elements, the code you will arrive at is more like: a[:,:,0] = a[:,:,1] a[:,:,2] = a[:,:,0] a[:,:,1] = a[:,:,2] With no loops: or you could give them names: a0 = a[:,:,0] a1 = a[:,:,1] a2 = a[:,:,2] then: a0[:] = a1 a2[:] = a0 a1[:] = a2 which, of course, is really: a[:,:,:] = a1.reshape((dim,dim,1)) but I suspect that that's the result of a typo. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy performance vs Matlab.
josef.p...@gmail.com wrote: So for simple loops python looses, but for other things, python wins by a huge margin. which emphasizes the point that you can't write code the same way in the two languages, though I'd argue that that code needs refactoring in any language! However, numpy's reference semantics is definitely a strong advantage of MATLAB -- more flexibility in general. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy performance vs Matlab.
Well it is the best pitch for numpy versus matlab I have read so far :) (and I 100% agree) Xavier On 1/7/2009 4:16 PM, David Cournapeau wrote: I think on recent versions of matlab, there is nothing you can do without modifying the code: matlab has some JIT compilation for loops, which is supposed to speed up those cases - at least, that's what is claimed by matlab. Yes it does. After using both for more than 10 years, my impression is this: - Matlab slicing creates new arrays. NumPy slicing creates views. NumPy is faster and more memory efficient. - Matlab JIT compiles loops. NumPy does not. Matlab is faster for stupid programmers that don't know how use slices. But neither Matlab nor Python/NumPy is meant to be used like Java. - Python has psyco. It is about as good as Matlab's JIT. But psyco has no knowledge of NumPy ndarrays. - Using Cython is easier than writing Matlab MEX files. - Python has better support for data structures, better built-in structures (tuple, lists, dics, sets), and general purpose libraries. Matlab has extensive numerical toolboxes that you can buy. - Matlab pass function arguments by value (albeit COW optimized). Python pass references. This makes NumPy more efficient if you need to pass large arrays or array slices. - Matlab tends to fragment the heap (hence the pack command). Python/NumPy does not. This makes long-running processes notoriously unstable on Matlab. - Matlab has some numerical libraries that are better. - I like the Matlab command prompt and IDE better. But its not enough to make me want to use it. - Python is a proper programming language. Matlab is a numerical scripting language - good for small scripts but not complex software systems. Sturla Molden ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy performance vs Matlab.
On 1/7/2009 6:56 PM, Christopher Barker wrote: So for simple loops python looses, but for other things, python wins by a huge margin. which emphasizes the point that you can't write code the same way in the two languages, though I'd argue that that code needs refactoring in any language! Roux example would be bad in either language. Slices ('vectorization' in Matlab lingo) is preferred in both cases. It's just that neither Matlab nor Python/NumPy was designed to be used like Java. For loops should not be abused in Python nor in Matlab (but Matlab is more forgiving now than it used to be). Sturla Molden ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy performance vs Matlab.
On 1/7/2009 6:51 PM, Christopher Barker wrote: Even with this nifty JIT, It is not a very nifty JIT. It can transform some simple loops into vectorized expressions. And it removes the overhead from indexing with doubles. But if you are among those that do n = length(x) m = 0 for i = 1.0 : n m = m + x(i) end m = m / n instead of m = mean(x) it will be nifty enough. All lot of folks seem to think that the only reason to vectorize code in MATLAB, numpy, etc, is for better performance. If MATLAB now has a good JIT, then there is no point -- I think that's a mistake. Fortran 90/95 has array slicing as well. Sturla Molden ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy performance vs Matlab.
On Wed, Jan 7, 2009 at 1:32 PM, Sturla Molden stu...@molden.no wrote: On 1/7/2009 6:56 PM, Christopher Barker wrote: So for simple loops python looses, but for other things, python wins by a huge margin. which emphasizes the point that you can't write code the same way in the two languages, though I'd argue that that code needs refactoring in any language! Roux example would be bad in either language. Slices ('vectorization' in Matlab lingo) is preferred in both cases. It's just that neither Matlab nor Python/NumPy was designed to be used like Java. For loops should not be abused in Python nor in Matlab (but Matlab is more forgiving now than it used to be). Sturla Molden ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion I'm missing name spaces in matlab. everything is from path import * and it's more difficult to keep are larger project organized in matlab than in python. But, I think, matlab is ahead in parallelization (which I haven't used much) and learning matlab is easier than numpy. (dtypes and broadcasting are more restrictive in matlab but, for a beginner, easier to figure out) Josef ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy performance vs Matlab.
On 1/7/2009 7:52 PM, josef.p...@gmail.com wrote: But, I think, matlab is ahead in parallelization (which I haven't used much) Not really. There is e.g. nothing like Python's multiprocessing package in Matlab. Matlab is genrally single-threaded. Python is multi-threaded but there is a GIL. And having multiple Matlab processes running simultaneously consumes a lot of resources. Python is far better in this respect. Don't confuse vectorization with parallelization. It is not the same. If you are going to do real parallelization, you are better off using Python with multiprocessing or mpi4py. and learning matlab is easier than numpy. (dtypes and broadcasting are more restrictive in matlab but, for a beginner, easier to figure out) The available data types is about the same, at least last time I checked. (I am not thinking about Python built-ins here, but NumPy dtypes.) Matlab does not have broadcasting. Array shapes must always match. S.M. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy performance vs Matlab.
On Wed, Jan 7, 2009 at 10:19, Nicolas ROUX nicolas.r...@st.com wrote: Hi, I need help ;-) I have here a testcase which works much faster in Matlab than Numpy. The following code takes less than 0.9sec in Matlab, but 21sec in Python. Numpy is 24 times slower than Matlab ! The big trouble I have is a large team of people within my company is ready to replace Matlab by Numpy/Scipy/Matplotlib, but I have to demonstrate that this kind of Python Code is executed with the same performance than Matlab, without writing C extension. This is becoming a critical point for us. This is a testcase that people would like to see working without any code restructuring. Basically, if you want efficient numpy code, you have to use numpy idioms. If you want to continue to use Matlab idioms, keep using Matlab. The reasons are: - this way of writing is fairly natural. - the original code which showed me the matlab/Numpy performance differences is much more complex, and can't benefit from broadcasting or other numpy tips (I can later give this code) Please do. Otherwise, we can't actually address your concerns. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Accumulate values that are below threshold
Hi Bevan Since the number of output elements are unknown, I don't think you can implement this efficiently using arrays. If your dataset isn't too large, a for-loop should do the trick. Otherwise, you may have to run your code through Cython, which optimises for-loops around Python lists. thresh = 1.0 carry = 0 output = [] for idx, val in data: carry += val if (carry - thresh) = -1e-15: output.append((idx, carry)) carry = 0 The comparison line above, (carry - thresh0 = -1e-15, may look strange -- it basically just does carry = thresh. For some reason I don't quite understand, when accumulating floats, it sometimes happens that 1.0 != 1.0, so I use 1e-15 as protection. Regards Stéfan 2009/1/8 Bevan Jenkins beva...@gmail.com: Hello, Sometimes the hardest part of a problem is articulating it. Hopefully I can describe what I am trying to do - at least enough to get some help. I am trying to compare values to a threshold and when the values are lower than the threshold they are added to the value in my set until the threshold is reached. Everytime the threshold is reached I want the index and value (accumulated). Hopefully the example below will help threshold =1.0 for indx,val in enumerate(Q): print indx,val 0 100.0 1 20.0 2 16.0 3 7.0 4 3.0 5 1.5 6 0.8 7 0.6 8 0.5 9 0.2 10 0.2 11 0.1 12 0.1 The output I would like is (number of elements and value) 0 100.0 1 20.0 2 16.0 3 7.0 4 3.0 5 1.5 7 1.4 11 1.0 The 1st 6 elements are easy as they are all greater than or equal to the threshold(1.0). Once the values drop below the threshold the next value is added until the threshold is reached. Any help is appreciated, Bevan Jenkins ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion