Re: [Numpy-discussion] slow numpy.clip ?
Gael Varoquaux wrote: On Tue, Dec 19, 2006 at 02:10:29PM +0900, David Cournapeau wrote: I would really like to see the imshow/show calls goes in the range of a few hundred ms; for interactive plotting, this really change a lot in my opinion. I think this is strongly dependant on some parameters. I did some interactive plotting on both a pentium 2, linux, WxAgg (thus Gtk behind Wx), and a pentium 4, windows, WxAgg (thus MFC behin Wx), and there was a huge difference between the speeds. The speed difference was a few orders of magnitudes. I couldn't explain it but it was a good surprise, as the application was developped for the windows box. I started to investigate the problem because under matlab, plotting a spectrogram is negligeable compared to computing it, whereas in matplotlib with numpy array backend, plotting it takes as much time as computing it, which didn't make sense to me. Most of the computing time is spend into code which is independent of the backend, that is during the conversion from the rank 2 array to rgba (60 % of the time of my fast workstation, 85 % of the time on my laptop with a pentium M @ 1.2 Ghz), so I don't think the GUI backend makes any difference. cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] slow numpy.clip ?
Eric Firing wrote: David Cournapeau wrote: Well, this is something I would be willing to try *if* this is the main bottleneck of imshow/show. I am still unsure about the problem, because if I change numpy.clip to my function, including a copy, I really get a big difference myself: val = ma.array(nx.clip(val.filled(vmax), vmin, vmax), mask=mask) vs def myclip(b, m, M): a = b.copy() a[am] = m a[aM] = M return a val = ma.array(myclip(val.filled(vmax), vmin, vmax), mask=mask) By trying the best result, I get 0.888 ms vs 0.784 for a show() call, which is already a 10 % improvement, and I get almost a 15 % if I remove the copy. I am updating numpy/scipy/mpl on my laptop to see if this is specific to the CPU of my workstation (big cache, high frequency clock, bi CPU with HT enabled). Please try the putmask version without the copy on your machines; I expect it will be quite a bit faster on both machines. The relative speeds of the versions may differ widely depending on how many values actually get changed, though. On my workstation (dual xeon; I run each corresponding script 5 times and took the best result): - nx.clip takes ~ 170 ms (of 920 ms for the whole show call) - your fast clip, with copy: ~ 50 ms (of ~820 ms) - mine, with copy: ~50 ms (of ~830 ms) - your wo copy: ~ 30 ms (of 830 ms) - mine wo copy: ~ 40 ms (of 830 ms) Same on my laptop (pentium M @ 1.2 Ghz): - nx.clip takes ~ 230 ms (of 1460 ms) - mine with copy ~ 70 ms (of 1200 ms) - mine wo copy ~ 55 ms (of 1300 ms) - yours with copy ~ 80 ms (of 1300 ms) - yours wo copy ~ 67 ms (of 1300 ms) Basically, at least from those figures, both versions are pretty similar, and not worth improving much anyway for matplotlib. There is something funny with numpy version, though. cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] slow numpy.clip ?
David Cournapeau wrote: Basically, at least from those figures, both versions are pretty similar, and not worth improving much anyway for matplotlib. There is something funny with numpy version, though. Looking at the code, it's certainly not surprising that the current implementation of clip() is slow. It is a direct numpy C API translation of the following (taken from numarray, but it is the same in Numeric): def clip(m, m_min, m_max): clip() returns a new array with every entry in m that is less than m_min replaced by m_min, and every entry greater than m_max replaced by m_max. selector = ufunc.less(m, m_min)+2*ufunc.greater(m, m_max) return choose(selector, (m, m_min, m_max)) Creating that integer selector array is probably the most expensive part. Copying the array, then using putmask() or similar is certainly a better approach, and I can see no drawbacks to it. If anyone is up to translating their faster clip() into C, I'm more than happy to check it in. I might also entertain adding a copy=True keyword argument, but I'm not entirely certain we should be expanding the API during the 1.0.x series. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] slow numpy.clip ?
Robert Kern wrote: Looking at the code, it's certainly not surprising that the current implementation of clip() is slow. It is a direct numpy C API translation of the following (taken from numarray, but it is the same in Numeric): def clip(m, m_min, m_max): clip() returns a new array with every entry in m that is less than m_min replaced by m_min, and every entry greater than m_max replaced by m_max. selector = ufunc.less(m, m_min)+2*ufunc.greater(m, m_max) return choose(selector, (m, m_min, m_max)) Creating that integer selector array is probably the most expensive part. Copying the array, then using putmask() or similar is certainly a better approach, and I can see no drawbacks to it. If anyone is up to translating their faster clip() into C, I'm more than happy to check it in. I might also entertain adding a copy=True keyword argument, but I'm not entirely certain we should be expanding the API during the 1.0.x series. I would be happy to code the function; for new code to be added to numpy, is there another branch than the current one ? What is the approach for a 1.1.x version of numpy ? For now, putting the function with a copy (the current behaviour ?) would be ok, right ? The copy part is a much smaller problem than the rest of the function anyway, at least from my modest benchmarking, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] slow numpy.clip ?
David Cournapeau wrote: I would be happy to code the function; for new code to be added to numpy, is there another branch than the current one ? What is the approach for a 1.1.x version of numpy ? I don't think we've decided on one, yet. For now, putting the function with a copy (the current behaviour ?) would be ok, right ? The copy part is a much smaller problem than the rest of the function anyway, at least from my modest benchmarking, I'd prefer that you simply modify PyArray_Clip to use a better approach than to make an entirely new function. In that case, it certainly must make a copy. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Profiling numpy ? (parts written in C)
On 12/19/06, Francesc Altet [EMAIL PROTECTED] wrote: A Dimarts 19 Desembre 2006 08:12, David Cournapeau escrigué: Hi, snip My guess is that the real bottleneck is in calling so many times memmove (once per element in the array). Perhaps the algorithm can be changed to do a block copy at the beginning and then modify only the places on which the clip should act (kind of the same that you have made in Python, but at C level). IIRC, doing a simple type specific assignment is faster than either memmov or memcpy. If speed is really of the essence it would probably be worth writing a type specific version of clip. A special function combining clip with RGB conversion might do even better. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Unexpected output using numpy.ndarray and __radd__
Hi, The following issue has puzzled me for a while. I want to add a numpy.ndarray and an instance of my own class. I define this operation by implementing the methods __add__ and __radd__. My programme (including output) looks like: #!/usr/local/bin/python import numpy class Cyclehist: def __init__(self,vals): self.valuearray = numpy.array(vals) def __str__(self): return 'Cyclehist object: valuearray = '+str(self.valuearray) def __add__(self,other): print __add__ : ,self,other return self.valuearray + other def __radd__(self,other): print __radd__ : ,self,other return other + self.valuearray c = Cyclehist([1.0,-21.2,3.2]) a = numpy.array([-1.0,2.2,-2.2]) print c + a print a + c # -- OUTPUT -- # # addprob $ addprob.py # __add__ : Cyclehist object: valuearray = [ 1. -21.2 3.2] [-1. 2.2 -2.2] # [ 0. -19. 1.] # __radd__ : Cyclehist object: valuearray = [ 1. -21.2 3.2] -1.0 # __radd__ : Cyclehist object: valuearray = [ 1. -21.2 3.2] 2.2 # __radd__ : Cyclehist object: valuearray = [ 1. -21.2 3.2] -2.2 # [[ 0. -22.2 2.2] [ 3.2 -19.5.4] [ -1.2 -23.4 1. ]] # addprob $ # # I expected the output of c+a and a+c to be identical, however, the output of a+c gets nested in an elementwise fashion. Can anybody explain this? Is it a bug or a feature? I'm using Python 2.4.4c1 and numpy 1.0. I tried the programme using an older version of Python and numpy and there the result of c+a and a+c are identical. Regards, Mark Hoffmann ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] (no subject)
Hi, I would like to get information on the software licenses for numpy numeric. On the sourceforge home for the packages, the listed license is OSI-Approved Open Source /softwaremap/trove_list.php?form_cat=14 . Is it possible to get more information on this? A copy of the document would be useful. Thank you. Best regards, Derek Bandler ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] (no subject)
Hi Derek, Like all Free Open Source Software (FOSS) projects the license is distributed with the source code. There is a file called LICENSE.txt in the numpy tar archive. Here are the contents of that file. license Copyright (c) 2005, NumPy Developers All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the NumPy Developers nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. /license Greg On 12/19/06, Bandler, Derek [EMAIL PROTECTED] wrote: Hi, I would like to get information on the software licenses for numpy numeric. On the sourceforge home for the packages, the listed license is *OSI-Approved Open Source*. Is it possible to get more information on this? A copy of the document would be useful. Thank you. Best regards, Derek Bandler ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion -- Linux. Because rebooting is for adding hardware. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] (no subject)
Bandler, Derek wrote: Hi, I would like to get information on the software licenses for numpy numeric. On the sourceforge home for the packages, the listed license is _OSI-Approved Open Source_ file:///softwaremap/trove_list.php?form_cat=14. Is it possible to get more information on this? A copy of the document would be useful. Thank you. They are both BSD-like licenses. http://projects.scipy.org/scipy/numpy/browser/trunk/LICENSE.txt http://projects.scipy.org/scipy/scipy/browser/trunk/LICENSE.txt -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] slow numpy.clip ?
David Cournapeau wrote: Robert Kern wrote: Looking at the code, it's certainly not surprising that the current implementation of clip() is slow. It is a direct numpy C API translation of the following (taken from numarray, but it is the same in Numeric): def clip(m, m_min, m_max): clip() returns a new array with every entry in m that is less than m_min replaced by m_min, and every entry greater than m_max replaced by m_max. selector = ufunc.less(m, m_min)+2*ufunc.greater(m, m_max) return choose(selector, (m, m_min, m_max)) Creating that integer selector array is probably the most expensive part. Copying the array, then using putmask() or similar is certainly a better approach, and I can see no drawbacks to it. If anyone is up to translating their faster clip() into C, I'm more than happy to check it in. I might also entertain adding a copy=True keyword argument, but I'm not entirely certain we should be expanding the API during the 1.0.x series. I would be happy to code the function; for new code to be added to numpy, is there another branch than the current one ? What is the approach for a 1.1.x version of numpy ? The idea is to make a 1.0.x branch as soon as the trunk changes the C-API. The guarantee is that extension modules won't have to be rebuilt until 1.1. I don't know that we've specified if there will be *no* API changes. For example, there have already been some backward-compatible extensions to the 1.0.X series. I like the idea of being able to add functions to the 1.0.X series but without breaking compatibility. I also don't mind adding new keywords to functions (but not to C-API calls as that would require a re-compile of extension modules). -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] slow numpy.clip ?
Robert Kern wrote: David Cournapeau wrote: Basically, at least from those figures, both versions are pretty similar, and not worth improving much anyway for matplotlib. There is something funny with numpy version, though. Looking at the code, it's certainly not surprising that the current implementation of clip() is slow. It is a direct numpy C API translation of the following (taken from numarray, but it is the same in Numeric): def clip(m, m_min, m_max): clip() returns a new array with every entry in m that is less than m_min replaced by m_min, and every entry greater than m_max replaced by m_max. selector = ufunc.less(m, m_min)+2*ufunc.greater(m, m_max) return choose(selector, (m, m_min, m_max)) There are a lot of functions that are essentially this. Many things were done to just get something working. It would seem like a good idea to re-code many of these to speed them up. Creating that integer selector array is probably the most expensive part. Copying the array, then using putmask() or similar is certainly a better approach, and I can see no drawbacks to it. If anyone is up to translating their faster clip() into C, I'm more than happy to check it in. I might also entertain adding a copy=True keyword argument, but I'm not entirely certain we should be expanding the API during the 1.0.x series. The problem with the copy=True keyword is that it would imply needing to expand the C-API for PyArray_Clip and should not be done until 1.1 IMHO. We would probably be better off not expanding the keyword arguments to methods as well until that time. -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] slow numpy.clip ?
Travis Oliphant wrote: The problem with the copy=True keyword is that it would imply needing to expand the C-API for PyArray_Clip and should not be done until 1.1 IMHO. I don't think we have to change the signature of PyArray_Clip() at all. PyArray_Clip() takes an out argument. Currently, this is only set to something other than NULL if explicitly provided as a keyword out= argument to numpy.ndarray.clip(). All we have to do is modify the implementation of array_clip() to parse a copy= argument and set out = self before calling PyArray_Clip(). -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] slow numpy.clip ?
Travis Oliphant wrote: There are a lot of functions that are essentially this. Many things were done to just get something working. It would seem like a good idea to re-code many of these to speed them up. Off the top of your head, do you have a list of these? -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Profiling numpy ? (parts written in C)
Charles R Harris wrote: My guess is that the real bottleneck is in calling so many times memmove (once per element in the array). Perhaps the algorithm can be changed to do a block copy at the beginning and then modify only the places on which the clip should act (kind of the same that you have made in Python, but at C level). IIRC, doing a simple type specific assignment is faster than either memmov or memcpy. If speed is really of the essence it would probably be worth writing a type specific version of clip. A special function combining clip with RGB conversion might do even better. At the end, in the original context (speeding the drawing of spectrogram), this is the problem. Even if multiple backend/toolkits have obviously an impact in performances, I really don't see why a numpy function to convert an array to a RGB representation should be 10-20 times slower than matlab on the same machine. I will take into account all those helpful messages, and hopefully come with something for the end of the week :), cheers David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Profiling numpy ? (parts written in C)
David == David Cournapeau [EMAIL PROTECTED] writes: David At the end, in the original context (speeding the drawing David of spectrogram), this is the problem. Even if multiple David backend/toolkits have obviously an impact in performances, David I really don't see why a numpy function to convert an array David to a RGB representation should be 10-20 times slower than David matlab on the same machine. This isn't exactly right. When matplotlib converts a 2D grayscale array to rgba, a lot goes on under the hood. It's all numpy, but it's far from single function and it involves many passes through the data. In principle, this could be done with one or two passes through the data. In practice, our normalization and colormapping abstractions are so abstract that it is difficult (though not impossible) to special case and optimize. The top-level routine is def to_rgba(self, x, alpha=1.0): '''Return a normalized rgba array corresponding to x. If x is already an rgb or rgba array, return it unchanged. ''' if hasattr(x, 'shape') and len(x.shape)2: return x x = ma.asarray(x) x = self.norm(x) x = self.cmap(x, alpha) return x which implies at a minimum two passes through the data, one for norm and one for cmap. In 99% of the use cases, cmap is a LinearSegmentedColormap though users can define their own as long as it is callable. My guess is that the expensive part is Colormap.__call__, the base class for LinearSegmentedColormap. We could probably write some extension code that does the following routine in one pass through the data. But it would be hairy. In a quick look and rough count, I see about 10 passes through the data in the function below. If you are interested in optimizing colormapping in mpl, I'd start here. I suspect there may be some low hanging fruit. def __call__(self, X, alpha=1.0): X is either a scalar or an array (of any dimension). If scalar, a tuple of rgba values is returned, otherwise an array with the new shape = oldshape+(4,). If the X-values are integers, then they are used as indices into the array. If they are floating point, then they must be in the interval (0.0, 1.0). Alpha must be a scalar. if not self._isinit: self._init() alpha = min(alpha, 1.0) # alpha must be between 0 and 1 alpha = max(alpha, 0.0) self._lut[:-3, -1] = alpha mask_bad = None if not iterable(X): vtype = 'scalar' xa = array([X]) else: vtype = 'array' xma = ma.asarray(X) xa = xma.filled(0) mask_bad = ma.getmask(xma) if typecode(xa) in typecodes['Float']: putmask(xa, xa==1.0, 0.999) #Treat 1.0 as slightly less than 1. xa = (xa * self.N).astype(Int) # Set the over-range indices before the under-range; # otherwise the under-range values get converted to over-range. putmask(xa, xaself.N-1, self._i_over) putmask(xa, xa0, self._i_under) if mask_bad is not None and mask_bad.shape == xa.shape: putmask(xa, mask_bad, self._i_bad) rgba = take(self._lut, xa) if vtype == 'scalar': rgba = tuple(rgba[0,:]) return rgba David I will take into account all those helpful messages, and David hopefully come with something for the end of the week :), David cheers David David ___ David Numpy-discussion mailing list Numpy-discussion@scipy.org David http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Profiling numpy ? (parts written in C)
John, The current version of __call__ already includes substantial speedups prompted by David's profiling, and if I understand correctly the present bottleneck is actually the numpy take function. That is not to say that other improvements can't be made, of course. Eric John Hunter wrote: David == David Cournapeau [EMAIL PROTECTED] writes: David At the end, in the original context (speeding the drawing David of spectrogram), this is the problem. Even if multiple David backend/toolkits have obviously an impact in performances, David I really don't see why a numpy function to convert an array David to a RGB representation should be 10-20 times slower than David matlab on the same machine. This isn't exactly right. When matplotlib converts a 2D grayscale array to rgba, a lot goes on under the hood. It's all numpy, but it's far from single function and it involves many passes through the data. In principle, this could be done with one or two passes through the data. In practice, our normalization and colormapping abstractions are so abstract that it is difficult (though not impossible) to special case and optimize. The top-level routine is def to_rgba(self, x, alpha=1.0): '''Return a normalized rgba array corresponding to x. If x is already an rgb or rgba array, return it unchanged. ''' if hasattr(x, 'shape') and len(x.shape)2: return x x = ma.asarray(x) x = self.norm(x) x = self.cmap(x, alpha) return x which implies at a minimum two passes through the data, one for norm and one for cmap. In 99% of the use cases, cmap is a LinearSegmentedColormap though users can define their own as long as it is callable. My guess is that the expensive part is Colormap.__call__, the base class for LinearSegmentedColormap. We could probably write some extension code that does the following routine in one pass through the data. But it would be hairy. In a quick look and rough count, I see about 10 passes through the data in the function below. If you are interested in optimizing colormapping in mpl, I'd start here. I suspect there may be some low hanging fruit. def __call__(self, X, alpha=1.0): X is either a scalar or an array (of any dimension). If scalar, a tuple of rgba values is returned, otherwise an array with the new shape = oldshape+(4,). If the X-values are integers, then they are used as indices into the array. If they are floating point, then they must be in the interval (0.0, 1.0). Alpha must be a scalar. if not self._isinit: self._init() alpha = min(alpha, 1.0) # alpha must be between 0 and 1 alpha = max(alpha, 0.0) self._lut[:-3, -1] = alpha mask_bad = None if not iterable(X): vtype = 'scalar' xa = array([X]) else: vtype = 'array' xma = ma.asarray(X) xa = xma.filled(0) mask_bad = ma.getmask(xma) if typecode(xa) in typecodes['Float']: putmask(xa, xa==1.0, 0.999) #Treat 1.0 as slightly less than 1. xa = (xa * self.N).astype(Int) # Set the over-range indices before the under-range; # otherwise the under-range values get converted to over-range. putmask(xa, xaself.N-1, self._i_over) putmask(xa, xa0, self._i_under) if mask_bad is not None and mask_bad.shape == xa.shape: putmask(xa, mask_bad, self._i_bad) rgba = take(self._lut, xa) if vtype == 'scalar': rgba = tuple(rgba[0,:]) return rgba David I will take into account all those helpful messages, and David hopefully come with something for the end of the week :), David cheers David David ___ David Numpy-discussion mailing list Numpy-discussion@scipy.org David http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Profiling numpy ? (parts written in C)
Francesc Altet wrote: So, cProfile is only showing where the time is spent at the first-level calls in extension level. If we want more introspection on the C stack, and you are running un Linux, oprofile (http://oprofile.sourceforge.net) is a very nice profiler. Here are the outputs for the above routines on my machine. For clip1: Profiling through timer interrupt samples %image name symbol name 643 54.6769 libc-2.3.6.somemmove 151 12.8401 multiarray.soPyArray_Choose 352.9762 umath.so BYTE_multiply 342.8912 umath.so DOUBLE_greater 322.7211 mtrand.sork_random 322.7211 umath.so DOUBLE_less 302.5510 libc-2.3.6.somemcpy For clip2: Profiling through timer interrupt samples %image name symbol name 188 24.5111 libc-2.3.6.somemmove 143 18.6441 multiarray.so_nonzero_indices 126 16.4276 multiarray.soPyArray_MapIterNext 374.8240 umath.so DOUBLE_greater 364.6936 mtrand.sork_gauss 334.3025 umath.so DOUBLE_less 243.1291 libc-2.3.6.somemcpy Could you detail a bit how you did the profiling with oprofile ? I don't manage to get the same results than you (that is on per application basis when the application is a python script and not a 'binary' program) Thank you, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion