Re: [Numpy-discussion] ufunc for sum of squared difference
Yeah, but it's not so obvious what's happening "under the hoods". Consider this (with an old Win7 machine): Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] np.__version__ '1.11.1' On Mon, Nov 14, 2016 at 10:38 AM, Jerome Kieffer <jerome.kief...@esrf.fr> wrote: > On Fri, 11 Nov 2016 11:25:58 -0500 > Matthew Harrigan <harrigan.matt...@gmail.com> wrote: > > > I started a ufunc to compute the sum of square differences here > > <https://gist.github.com/mattharrigan/6f678b3d6df5efd236fc23bfb59fd3bd>. > > It is about 4x faster and uses half the memory compared to > > np.sum(np.square(x-c)). > > Hi Matt, > > Using *blas* you win already a factor two (maybe more depending on you > blas implementation): > > % python -m timeit -s "import numpy as np;x=np.linspace(0,1,int(1e7))" > "np.sum(np.square(x-2.))" > 10 loops, best of 3: 135 msec per loop > > % python -m timeit -s "import numpy as np;x=np.linspace(0,1,int(1e7))" > "y=x-2.;np.dot(y,y)" > 10 loops, best of 3: 70.2 msec per loop > x= np.linspace(0, 1, int(1e6)) timeit np.sum(np.square(x- 2.)) 10 loops, best of 3: 23 ms per loop y= x- 2. timeit np.dot(y, y) The slowest run took 18.60 times longer than the fastest. This could mean that an intermediate result is being cached. 1000 loops, best of 3: 1.78 ms per loop timeit np.dot(y, y) 1000 loops, best of 3: 1.73 ms per loop Best, eat > > > Cheers, > -- > Jérôme Kieffer > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Characteristic of a Matrix.
Hi, On Mon, Jan 5, 2015 at 8:40 PM, Colin J. Williams cjwilliam...@gmail.com wrote: One of the essential characteristics of a matrix is that it be rectangular. This is neither spelt out or checked currently. The Doc description refers to a class: - *class *numpy.matrix[source] http://github.com/numpy/numpy/blob/v1.9.1/numpy/matrixlib/defmatrix.py#L206 Returns a matrix from an array-like object, or from a string of data. A matrix is aspecialized 2-D array that retains its 2-D nature through operations. It has certain special operators, such as * (matrix multiplication) and ** (matrix power). This illustrates a failure, which is reported later in the calculation: A2= np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]]) Here 2 - 6 is treated as an expression. FWIW, here A2 is definitely rectangular, with shape== (1, 3) and dtype== object, i.e elements are just python lists. Wikipedia offers: In mathematics http://en.wikipedia.org/wiki/Mathematics, a *matrix* (plural *matrices*) is a rectangular http://en.wikipedia.org/wiki/Rectangle *array http://en.wiktionary.org/wiki/array*[1] http://en.wikipedia.org/wiki/Matrix_%28mathematics%29#cite_note-1 of numbers http://en.wikipedia.org/wiki/Number, symbols http://en.wikipedia.org/wiki/Symbol_%28formal%29, or expressions http://en.wikipedia.org/wiki/Expression_%28mathematics%29, arranged in *rows http://en.wiktionary.org/wiki/row* and *columns http://en.wiktionary.org/wiki/column*.[2] http://en.wikipedia.org/wiki/Matrix_%28mathematics%29#cite_note-2[3] http://en.wikipedia.org/wiki/Matrix_%28mathematics%29#cite_note-3 (and in this context also python objects). -eat The individual items in a matrix are called its *elements* or *entries*. An example of a matrix with 2 rows and 3 columns is [image: \begin{bmatrix}1 9 -13 \\20 5 -6 \end{bmatrix}.]In the Numpy context, the symbols or expressions need to be evaluable. Colin W. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Characteristic of a Matrix.
On Mon, Jan 5, 2015 at 9:36 PM, Nathaniel Smith n...@pobox.com wrote: On Mon, Jan 5, 2015 at 7:18 PM, josef.p...@gmail.com wrote: On Mon, Jan 5, 2015 at 1:58 PM, Nathaniel Smith n...@pobox.com wrote: I'm afraid that I really don't understand what you're trying to say. Is there something that you think numpy should be doing differently? I liked it better when this raised an exception, instead of creating a rectangular object array. Did it really used to raise an exception? Patches accepted :-) (#5303 is the relevant bug, like Warren points out. From the discussion there it doesn't look like np.array's handling of non-conformable lists has any defenders.) +1 for 'object array [and matrix] construction should require explicitly specifying dtype= object' -eat -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate numpy.matrix
On Mon, Feb 10, 2014 at 7:00 PM, alex argri...@ncsu.edu wrote: On Mon, Feb 10, 2014 at 11:27 AM, josef.p...@gmail.com wrote: How do we calculate the diagonal of the hat matrix without using N by N matrices? Not sure if this was a rhetorical question or what, but this seems to work leverages = np.square(scipy.linalg.qr(X, mode='economic')[0]).sum(axis=1) http://www4.ncsu.edu/~ipsen/ps/slides_CSE2013.pdf Rhetorical or not, but FWIW I'll prefer to take singular value decomposition (u, s, vt= svd(x)) and then based on the singular values sI'll estimate a numerically feasible rank r. Thus the diagonal of such hat matrix would be (u[:, :r]** 2).sum(1). Regards, -eat Sorry for off-topic... ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate numpy.matrix
On Mon, Feb 10, 2014 at 9:08 PM, alex argri...@ncsu.edu wrote: On Mon, Feb 10, 2014 at 2:03 PM, eat e.antero.ta...@gmail.com wrote: Rhetorical or not, but FWIW I'll prefer to take singular value decomposition (u, s, vt= svd(x)) and then based on the singular values s I'll estimate a numerically feasible rank r. Thus the diagonal of such hat matrix would be (u[:, :r]** 2).sum(1). It's a small detail but you probably want svd(x, full_matrices=False) to avoid anything NxN. Indeed. Thanks, -eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Creating an ndarray from an iterable over sequences
Hi, On Tue, Jan 21, 2014 at 8:34 AM, Dr. Leo fhaxbo...@googlemail.com wrote: Hi, I would like to write something like: In [25]: iterable=((i, i**2) for i in range(10)) In [26]: a=np.fromiter(iterable, int32) --- ValueErrorTraceback (most recent call last) ipython-input-26-5bcc2e94dbca in module() 1 a=np.fromiter(iterable, int32) ValueError: setting an array element with a sequence. Is there an efficient way to do this? Perhaps you could just utilize structured arrays ( http://docs.scipy.org/doc/numpy/user/basics.rec.html), like: iterable= ((i, i**2) for i in range(10)) a= np.fromiter(iterable, [('a', int32), ('b', int32)], 10) a.view(int32).reshape(-1, 2) Out[]: array([[ 0, 0], [ 1, 1], [ 2, 4], [ 3, 9], [ 4, 16], [ 5, 25], [ 6, 36], [ 7, 49], [ 8, 64], [ 9, 81]]) My 2 cents, -eat Creating two 1-dimensional arrays first is costly as one has to iterate twice over the data. So the only way I see is creating an empty [10,2] array and filling it row by row. This is memory-efficient but slow. List comprehension is vice versa. If there is no solution, wouldn't it be possible to rewrite fromiter so as to accept sequences? Leo ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Possible conversion bug with record array
Hi, FWIW, apparently bug related to dtype of np.eye(.) On Wed, May 22, 2013 at 8:07 PM, Nicolas Rougier nicolas.roug...@inria.frwrote: Hi all, I got a weird output from the following script: import numpy as np U = np.zeros(1, dtype=[('x', np.float32, (4,4))]) U[0] = np.eye(4) print U[0] # output: ([[0.0, 1.875, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 1.875], [0.0, 0.0, 0.0, 0.0]],) U[0] = np.eye(4, dtype=np.float32) print U[0] # output: ([[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 1.0]],) The first output is obviously wrong. Can anyone confirm ? (using numpy 1.7.1 on osx 10.8.3) In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.__version__ Out[]: '1.6.0' In []: U= np.zeros(1, dtype= [('x', np.float32, (4, 4))]) In []: U[0]= np.eye(4) In []: U Out[]: array([ ([[0.0, 1.875, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 1.875], [0.0, 0.0, 0.0, 0.0]],)], dtype=[('x', 'f4', (4, 4))]) In []: U[0]= np.eye(4, dtype= np.float32) In []: U Out[]: array([ ([[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 1.0]],)], dtype=[('x', 'f4', (4, 4))]) and In []: sys.version Out[]: '3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)]' In []: np.__version__ Out[]: '1.7.0rc1' In []: U= np.zeros(1, dtype= [('x', np.float32, (4, 4))]) In []: U[0]= np.eye(4) In []: U Out[17]: array([ ([[0.0, 1.875, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 1.875], [0.0, 0.0, 0.0, 0.0]],)], dtype=[('x', 'f4', (4, 4))]) U[0]= np.eye(4, dtype= np.float32) In []: U Out[]: array([ ([[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 1.0]],)], dtype=[('x', 'f4', (4, 4))]) My 2 cents, -eat Nicolas ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] drawing the line (was: Adding .abs() method to the array object)
Huh, On Tue, Feb 26, 2013 at 5:03 PM, josef.p...@gmail.com wrote: On Tue, Feb 26, 2013 at 9:39 AM, Benjamin Root ben.r...@ou.edu wrote: On Mon, Feb 25, 2013 at 8:23 PM, Alan G Isaac alan.is...@gmail.com wrote: I'm hoping this discussion will return to the drawing the line question. http://stackoverflow.com/questions/8108688/in-python-when-should-i-use-a-function-instead-of-a-method Alan Isaac Proposed line: Reduction methods only. Discuss... arr.dot ? the 99 most common functions for which chaining looks nice. Care to elaborate more for us less uninitiated? Regards, -eat Josef Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Shouldn't all in-place operations simply return self?
Hi, On Fri, Jan 18, 2013 at 12:13 AM, Thouis (Ray) Jones tho...@gmail.comwrote: On Thu, Jan 17, 2013 at 10:27 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Jan 16, 2013 at 5:11 PM, eat e.antero.ta...@gmail.com wrote: Hi, In a recent thread http://article.gmane.org/gmane.comp.python.numeric.general/52772 it was proposed that .fill(.) should return self as an alternative for a trivial two-liner. I'm raising now the question: what if all in-place operations indeed could return self? How bad this would be? A 'strong' counter argument may be found at http://mail.python.org/pipermail/python-dev/2003-October/038855.html . But anyway, at least for me. it would be much more straightforward to implement simple mini dsl's (http://en.wikipedia.org/wiki/Domain-specific_language) a much more straightforward manner. What do you think? I've read Guido about why he didn't like inplace operations returning self and found him convincing for a while. And then I listened to other folks express a preference for the freight train style and found them convincing also. I think it comes down to a preference for one style over another and I go back and forth myself. If I had to vote, I'd go for returning self, but I'm not sure it's worth breaking python conventions to do so. Chuck I'm -1 on breaking with Python convention without very good reasons. As an example I personally find following behavior highly counter intuitive. In []: p, P= rand(3, 1), rand(3, 5) In []: ((p- P)** 2).sum(0).argsort() Out[]: array([2, 4, 1, 3, 0]) In []: ((p- P)** 2).sum(0).sort().diff() Traceback (most recent call last): File ipython console, line 1, in module AttributeError: 'NoneType' object has no attribute 'diff' Regards, -eat Ray ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Shouldn't all in-place operations simply return self?
Hi, In a recent thread http://article.gmane.org/gmane.comp.python.numeric.general/52772 it was proposed that .fill(.) should return self as an alternative for a trivial two-liner. I'm raising now the question: what if all in-place operations indeed could return self? How bad this would be? A 'strong' counter argument may be found at http://mail.python.org/pipermail/python-dev/2003-October/038855.html. But anyway, at least for me. it would be much more straightforward to implement simple mini dsl's ( http://en.wikipedia.org/wiki/Domain-specific_language) a much more straightforward manner. What do you think? -eat P.S. FWIW, if this idea really gains momentum obviously I'm volunteering to create a PR of it. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort
Hi, On Tue, Jan 15, 2013 at 1:50 PM, Mads Ipsen madsip...@gmail.com wrote: Hi, I simply can't understand this. I'm trying to use argsort to produce indices that can be used to sort an array: from numpy import * indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]]) args = argsort(indices, axis=0) print indices[args] gives: [[[ 1 12] [ 4 3]] [[ 4 3] [11 6]] [[ 8 9] [23 7]] [[11 6] [ 8 9]] [[23 7] [ 1 12]]] I thought this should produce a sorted version of the indices array. Any help is appreciated. Perhaps these three different point of views will help you a little bit more to move on: In []: x Out[]: array([[ 4, 3], [ 1, 12], [23, 7], [11, 6], [ 8, 9]]) In []: ind= x.argsort(axis= 0) In []: ind Out[]: array([[1, 0], [0, 3], [4, 2], [3, 4], [2, 1]]) In []: x[ind[:, 0]] Out[]: array([[ 1, 12], [ 4, 3], [ 8, 9], [11, 6], [23, 7]]) In []: x[ind[:, 1]] Out[]: array([[ 4, 3], [11, 6], [23, 7], [ 8, 9], [ 1, 12]]) In []: x[ind, [0, 1]] Out[]: array([[ 1, 3], [ 4, 6], [ 8, 7], [11, 9], [23, 12]]) -eat Best regards, Mads -- +-+ | Mads Ipsen | +--+--+ | Gåsebæksvej 7, 4. tv | | | DK-2500 Valby| phone: +45-29716388 | | Denmark | email: mads.ip...@gmail.com | +--+--+ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan result from np.linalg.lstsq()
Hi, On Mon, Oct 29, 2012 at 11:01 AM, Larry Paltrow larry.palt...@gmail.comwrote: np.isnan(data) is True False Check with: np.all(~np.isnan(x)) My 2 cents, -eat On Mon, Oct 29, 2012 at 1:50 AM, Pauli Virtanen p...@iki.fi wrote: Larry Paltrow larry.paltrow at gmail.com writes: I'm having some trouble using the linalg.lstsq() function with certain data and trying to understand why. It's always returning nans when I feed it this numpy array of float64 data: data = df.close.values #coming from a pandas dataframe type(data) numpy.ndarray Maybe you have NaNs in the array? (i.e. np.isnan(data) is True) -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] sum and prod
Hi, On Sun, Sep 9, 2012 at 12:56 AM, nicky van foreest vanfore...@gmail.comwrote: Hi, I ran the following code: args = np.array([4,8]) print np.sum( (arg 0) for arg in args) print np.sum([(arg 0) for arg in args]) print np.prod( (arg 0) for arg in args) print np.prod([(arg 0) for arg in args]) Can't see why someone would write code like above, but anyway: In []: args = np.array([4,8]) In []: print np.sum( (arg 0) for arg in args) 2 In []: print np.sum([(arg 0) for arg in args]) 2 In []: print np.prod( (arg 0) for arg in args) generator object genexpr at 0x062BDA08 In []: print np.prod([(arg 0) for arg in args]) 1 In []: print np.prod( (arg 0) for arg in args).next() True In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' My 2 cents, -eat with this result: 2 1 generator object genexpr at 0x1c70410 1 Is the difference between prod and sum intentional? I would expect that numpy.prod would also work on a generator, just like numpy.sum. BTW: the last line does what I need: the product over the truth values of all elements of args. Is there perhaps a nicer (conciser) way to achieve this? Thanks. Nicky ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 2 greatest values, in a 3-d array, along one axis
Hi, On Fri, Aug 3, 2012 at 7:02 PM, Angus McMorland amcm...@gmail.com wrote: On 3 August 2012 11:18, Jim Vickroy jim.vick...@noaa.gov wrote: Hello everyone, I'm trying to determine the 2 greatest values, in a 3-d array, along one axis. Here is an approach: # -- # procedure to determine greatest 2 values for 3rd dimension of 3-d array ... import numpy, numpy.ma xcnt, ycnt, zcnt = 2,3,4 # actual case is (1024, 1024, 8) p0 = numpy.empty ((xcnt,ycnt,zcnt)) for z in range (zcnt) : p0[:,:,z] = z*z zaxis = 2# max values to be determined for 3rd axis p0max = numpy.max (p0, axis=zaxis) # max values for zaxis maxindices = numpy.argmax (p0, axis=zaxis)# indices of max values p1 = p0.copy()# work array to scan for 2nd highest values j, i = numpy.meshgrid (numpy.arange (ycnt), numpy.arange (xcnt)) p1[i,j,maxindices] = numpy.NaN# flag all max values p1 = numpy.ma.masked_where (numpy.isnan (p1), p1) # hide all max values p1max = numpy.max (p1, axis=zaxis) # 2nd highest values for zaxis # additional code to analyze p0max and p1max goes here # -- I would appreciate feedback on a simpler approach -- e.g., one that does not require masked arrays and or use of magic values like NaN. Thanks, -- jv Here's a way that only uses argsort and fancy indexing: a = np.random.randint(10, size=(3,3,3)) print a [[[0 3 8] [4 2 8] [8 6 3]] [[0 6 7] [0 3 9] [0 9 1]] [[7 9 7] [5 2 9] [9 3 3]]] am = a.argsort(axis=2) maxs = a[np.arange(a.shape[0])[:,None], np.arange(a.shape[1])[None], am[:,:,-1]] print maxs [[8 8 8] [7 9 9] [9 9 9]] seconds = a[np.arange(a.shape[0])[:,None], np.arange(a.shape[1])[None], am[:,:,-2]] print seconds [[3 4 6] [6 3 1] [7 5 3]] And to double check: i, j = 0, 1 l = a[i, j,:] print l [4 2 8] print np.max(a[i,j,:]), maxs[i,j] 8 8 print l[np.argsort(l)][-2], second[i,j] 4 4 Good luck. Here the np.indicies function may help a little bit, like: In []: a= randint(10, size= (3, 2, 4)) In []: a Out[]: array([[[1, 9, 6, 6], [0, 3, 4, 2]], [[4, 2, 4, 4], [5, 9, 4, 4]], [[6, 1, 4, 3], [5, 4, 5, 5]]]) In []: ndx= indices(a.shape) In []: # largest In []: a[a.argsort(0), ndx[1], ndx[2]][-1] Out[]: array([[6, 9, 6, 6], [5, 9, 5, 5]]) In []: # second largest In []: a[a.argsort(0), ndx[1], ndx[2]][-2] Out[]: array([[4, 2, 4, 4], [5, 4, 4, 4]]) My 2 cents, -eat Angus. -- AJC McMorland Post-doctoral research fellow Neurobiology, University of Pittsburgh ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Reordering 2 dimensional array by column
Hi, On Thu, Aug 2, 2012 at 3:43 PM, Nicole Stoffels nicole.stoff...@forwind.dewrote: Dear all, I have a two-dimensional array: a = array([[1,2,3],[0,2,1],[5,7,8]]) I want to reorder it by the last column in descending order, so that I get: b =array([[5, 7, 8],[1, 2, 3],[0, 2, 1]]) Perhaps along the lines: In []: a Out[]: array([[1, 2, 3], [0, 2, 1], [5, 7, 8]]) In []: ndx= a[:, 2].argsort() In []: a[ndx[::-1], :] Out[]: array([[5, 7, 8], [1, 2, 3], [0, 2, 1]]) What I did first is the following, which reorders the array in ascending order (I found that method in the internet): b = array(sorted(a, key=lambda new_entry: new_entry[2])) b = array([[0, 2, 1],[1, 2, 3],[5, 7, 8]]) But I want it just the other way arround. So I did the following afterwards which results in an array only containing zeros: b_indices = b.argsort() b_matrix = b[b_indices[::-1]] new_b = b_matrix[len(b_matrix)-1] Is there an easy way to reorder it? Or is there at least a complicated way which produces the right output? I hope you can help me! Thanks! My 2 cents, -eat Best regards, Nicole ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] array slicing questions
Hi, On Tue, Jul 31, 2012 at 10:23 AM, Vlastimil Brom vlastimil.b...@gmail.comwrote: 2012/7/30 eat e.antero.ta...@gmail.com: Hi, A partial answer to your questions: On Mon, Jul 30, 2012 at 10:33 PM, Vlastimil Brom vlastimil.b...@gmail.com wrote: Hi all, I'd like to ask for some hints or advice regarding the usage of numpy.array and especially slicing. I only recently tried numpy and was impressed by the speedup in some parts of the code, hence I suspect, that I might miss some other oportunities in this area. I currently use the following code for a simple visualisation of the search matches within the text, the arrays are generally much larger than the sample - the texts size is generally hundreds of kilobytes up to a few MB - with an index position for each character. First there is a list of spans(obtained form the regex match objects), the respective character indices in between these slices should be set to 1: import numpy characters_matches = numpy.zeros(10) matches_spans = numpy.array([[2,4], [5,9]]) for start, stop in matches_spans: ... characters_matches[start:stop] = 1 ... characters_matches array([ 0., 0., 1., 1., 0., 1., 1., 1., 1., 0.]) Is there maybe a way tu achieve this in a numpy-only way - without the python loop? (I got the impression, the powerful slicing capabilities could make it possible, bud haven't found this kind of solution.) In the next piece of code all the character positions are evaluated with their neighbourhood and a kind of running proportions of the matched text parts are computed (the checks_distance could be generally up to the order of the half the text length, usually less : check_distance = 1 floating_checks_proportions = [] for i in numpy.arange(len(characters_matches)): ... lo = i - check_distance ... if lo 0: ... lo = None ... hi = i + check_distance + 1 ... checked_sublist = characters_matches[lo:hi] ... proportion = (checked_sublist.sum() / (check_distance * 2 + 1.0)) ... floating_checks_proportions.append(proportion) ... floating_checks_proportions [0.0, 0.1, 0.3, 0.3, 0.3, 0.3, 1.0, 1.0, 0.3, 0.1] Define a function for proportions: from numpy import r_ from numpy.lib.stride_tricks import as_strided as ast def proportions(matches, distance= 1): cd, cd2p1, s= distance, 2* distance+ 1, matches.strides[0] # pad m= r_[[0.]* cd, matches, [0.]* cd] # create a suitable view m= ast(m, shape= (m.shape[0], cd2p1), strides= (s, s)) # average return m[:-2* cd].sum(1)/ cd2p1 and use it like: In []: matches Out[]: array([ 0., 0., 1., 1., 0., 1., 1., 1., 1., 0.]) In []: proportions(matches).round(2) Out[]: array([ 0. , 0.33, 0.67, 0.67, 0.67, 0.67, 1. , 1. , 0.67, 0.33]) In []: proportions(matches, 5).round(2) Out[]: array([ 0.27, 0.36, 0.45, 0.55, 0.55, 0.55, 0.55, 0.55, 0.45, 0.36]) I'd like to ask about the possible better approaches, as it doesn't look very elegant to me, and I obviously don't know the implications or possible drawbacks of numpy arrays in some scenarios. the pattern for i in range(len(...)): is usually considered inadequate in python, but what should be used in this case as the indices are primarily needed? is something to be gained or lost using (x)range or np.arange as the python loop is (probably?) inevitable anyway? Here np.arange(.) will create a new array and potentially wasting memory if it's not otherwise used. IMO nothing wrong looping with xrange(.) (if you really need to loop ;). Is there some mor elegant way to check for the underflowing lower bound lo to replace with None? Is it significant, which container is used to collect the results of the computation in the python loop - i.e. python list or a numpy array? (Could possibly matplotlib cooperate better with either container?) And of course, are there maybe other things, which should be made better/differently? (using Numpy 1.6.2, python 2.7.3, win XP) My 2 cents, -eat Thanks in advance for any hints or suggestions, regards, Vlastimil Brom ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Hi, thank you very much for your suggestions! do I understand it correctly, that I have to special-case the function for distance = 0 (which should return the matches themselves without recalculation)? Yes. However, more importantly, I am getting a ValueError for some larger, (but not completely unreasonable) distance proportions(matches, distance= 8190) Traceback (most recent call
Re: [Numpy-discussion] array slicing questions
Hi, On Tue, Jul 31, 2012 at 5:01 PM, Vlastimil Brom vlastimil.b...@gmail.comwrote: 2012/7/31 eat e.antero.ta...@gmail.com: Hi, On Tue, Jul 31, 2012 at 10:23 AM, Vlastimil Brom vlastimil.b...@gmail.com wrote: 2012/7/30 eat e.antero.ta...@gmail.com: Hi, A partial answer to your questions: On Mon, Jul 30, 2012 at 10:33 PM, Vlastimil Brom vlastimil.b...@gmail.com wrote: Hi all, I'd like to ask for some hints or advice regarding the usage of numpy.array and especially slicing. I only recently tried numpy and was impressed by the speedup in some parts of the code, hence I suspect, that I might miss some other oportunities in this area. I currently use the following code for a simple visualisation of the search matches within the text, the arrays are generally much larger than the sample - the texts size is generally hundreds of kilobytes up to a few MB - with an index position for each character. First there is a list of spans(obtained form the regex match objects), the respective character indices in between these slices should be set to 1: import numpy characters_matches = numpy.zeros(10) matches_spans = numpy.array([[2,4], [5,9]]) for start, stop in matches_spans: ... characters_matches[start:stop] = 1 ... characters_matches array([ 0., 0., 1., 1., 0., 1., 1., 1., 1., 0.]) Is there maybe a way tu achieve this in a numpy-only way - without the python loop? (I got the impression, the powerful slicing capabilities could make it possible, bud haven't found this kind of solution.) In the next piece of code all the character positions are evaluated with their neighbourhood and a kind of running proportions of the matched text parts are computed (the checks_distance could be generally up to the order of the half the text length, usually less : check_distance = 1 floating_checks_proportions = [] for i in numpy.arange(len(characters_matches)): ... lo = i - check_distance ... if lo 0: ... lo = None ... hi = i + check_distance + 1 ... checked_sublist = characters_matches[lo:hi] ... proportion = (checked_sublist.sum() / (check_distance * 2 + 1.0)) ... floating_checks_proportions.append(proportion) ... floating_checks_proportions [0.0, 0.1, 0.3, 0.3, 0.3, 0.3, 1.0, 1.0, 0.3, 0.1] Define a function for proportions: from numpy import r_ from numpy.lib.stride_tricks import as_strided as ast def proportions(matches, distance= 1): cd, cd2p1, s= distance, 2* distance+ 1, matches.strides[0] # pad m= r_[[0.]* cd, matches, [0.]* cd] # create a suitable view m= ast(m, shape= (m.shape[0], cd2p1), strides= (s, s)) # average return m[:-2* cd].sum(1)/ cd2p1 and use it like: In []: matches Out[]: array([ 0., 0., 1., 1., 0., 1., 1., 1., 1., 0.]) In []: proportions(matches).round(2) Out[]: array([ 0. , 0.33, 0.67, 0.67, 0.67, 0.67, 1. , 1. , 0.67, 0.33]) In []: proportions(matches, 5).round(2) Out[]: array([ 0.27, 0.36, 0.45, 0.55, 0.55, 0.55, 0.55, 0.55, 0.45, 0.36]) I'd like to ask about the possible better approaches, as it doesn't look very elegant to me, and I obviously don't know the implications or possible drawbacks of numpy arrays in some scenarios. the pattern for i in range(len(...)): is usually considered inadequate in python, but what should be used in this case as the indices are primarily needed? is something to be gained or lost using (x)range or np.arange as the python loop is (probably?) inevitable anyway? Here np.arange(.) will create a new array and potentially wasting memory if it's not otherwise used. IMO nothing wrong looping with xrange(.) (if you really need to loop ;). Is there some mor elegant way to check for the underflowing lower bound lo to replace with None? Is it significant, which container is used to collect the results of the computation in the python loop - i.e. python list or a numpy array? (Could possibly matplotlib cooperate better with either container?) And of course, are there maybe other things, which should be made better/differently? (using Numpy 1.6.2, python 2.7.3, win XP) My 2 cents, -eat Thanks in advance for any hints or suggestions, regards, Vlastimil Brom ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Hi, thank you very much for your suggestions! do I understand it correctly, that I have to special-case
Re: [Numpy-discussion] array slicing questions
Hi, On Tue, Jul 31, 2012 at 6:43 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Jul 31, 2012 at 2:23 PM, eat e.antero.ta...@gmail.com wrote: Apparently ast(.) does not return a view of the original matches rather a copy of size (n* (2* distance+ 1)), thus you may run out of memory. The problem isn't memory, it's that on 32-bit Python, np.prod(arr.shape) must be 2**32 (or maybe 2**31 -- something like that). I think this is what the traceback is indicating. Normally you wouldn't be creating such arrays anyway because they would be too big to fit into memory, so this problem isn't observed, but when you're using stride_tricks then it's very easy to create arrays that use only a small amount of memory but that have very large shapes. But in this specific case .nbytes attribute indicates that a huge amount of memory is used. So I guess stride_tricks(.) is not returning a view. Solution: don't buy more memory, just use a 64-bit Python, where the limit is 2**64 (or 2**63 or whatever). Regards, -eat -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] array slicing questions
Hi, On Tue, Jul 31, 2012 at 7:30 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Jul 31, 2012 at 4:57 PM, eat e.antero.ta...@gmail.com wrote: Hi, On Tue, Jul 31, 2012 at 6:43 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Jul 31, 2012 at 2:23 PM, eat e.antero.ta...@gmail.com wrote: Apparently ast(.) does not return a view of the original matches rather a copy of size (n* (2* distance+ 1)), thus you may run out of memory. The problem isn't memory, it's that on 32-bit Python, np.prod(arr.shape) must be 2**32 (or maybe 2**31 -- something like that). I think this is what the traceback is indicating. Normally you wouldn't be creating such arrays anyway because they would be too big to fit into memory, so this problem isn't observed, but when you're using stride_tricks then it's very easy to create arrays that use only a small amount of memory but that have very large shapes. But in this specific case .nbytes attribute indicates that a huge amount of memory is used. So I guess stride_tricks(.) is not returning a view. No, .nbytes is lying to you -- it just returns np.prod(arr.shape) * arr.dtype.itemsize. It isn't smart enough to realize that you have wacky strides that cause the same memory region to be referenced by many different array elements. Aha, very good to know. Thanks, -eat -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] array slicing questions
Hi, On Tue, Jul 31, 2012 at 7:20 PM, Vlastimil Brom vlastimil.b...@gmail.comwrote: 2012/7/31 eat e.antero.ta...@gmail.com: Hi, On Tue, Jul 31, 2012 at 5:01 PM, Vlastimil Brom vlastimil.b...@gmail.com wrote: 2012/7/31 eat e.antero.ta...@gmail.com: Hi, On Tue, Jul 31, 2012 at 10:23 AM, Vlastimil Brom vlastimil.b...@gmail.com wrote: 2012/7/30 eat e.antero.ta...@gmail.com: Hi, A partial answer to your questions: On Mon, Jul 30, 2012 at 10:33 PM, Vlastimil Brom vlastimil.b...@gmail.com wrote: Hi all, I'd like to ask for some hints or advice regarding the usage of numpy.array and especially slicing. I only recently tried numpy and was impressed by the speedup in some parts of the code, hence I suspect, that I might miss some other oportunities in this area. I currently use the following code for a simple visualisation of the search matches within the text, the arrays are generally much larger than the sample - the texts size is generally hundreds of kilobytes up to a few MB - with an index position for each character. First there is a list of spans(obtained form the regex match objects), the respective character indices in between these slices should be set to 1: import numpy characters_matches = numpy.zeros(10) matches_spans = numpy.array([[2,4], [5,9]]) for start, stop in matches_spans: ... characters_matches[start:stop] = 1 ... characters_matches array([ 0., 0., 1., 1., 0., 1., 1., 1., 1., 0.]) Is there maybe a way tu achieve this in a numpy-only way - without the python loop? (I got the impression, the powerful slicing capabilities could make it possible, bud haven't found this kind of solution.) In the next piece of code all the character positions are evaluated with their neighbourhood and a kind of running proportions of the matched text parts are computed (the checks_distance could be generally up to the order of the half the text length, usually less : check_distance = 1 floating_checks_proportions = [] for i in numpy.arange(len(characters_matches)): ... lo = i - check_distance ... if lo 0: ... lo = None ... hi = i + check_distance + 1 ... checked_sublist = characters_matches[lo:hi] ... proportion = (checked_sublist.sum() / (check_distance * 2 + 1.0)) ... floating_checks_proportions.append(proportion) ... floating_checks_proportions [0.0, 0.1, 0.3, 0.3, 0.3, 0.3, 1.0, 1.0, 0.3, 0.1] Define a function for proportions: from numpy import r_ from numpy.lib.stride_tricks import as_strided as ast def proportions(matches, distance= 1): cd, cd2p1, s= distance, 2* distance+ 1, matches.strides[0] # pad m= r_[[0.]* cd, matches, [0.]* cd] # create a suitable view m= ast(m, shape= (m.shape[0], cd2p1), strides= (s, s)) # average return m[:-2* cd].sum(1)/ cd2p1 and use it like: In []: matches Out[]: array([ 0., 0., 1., 1., 0., 1., 1., 1., 1., 0.]) In []: proportions(matches).round(2) Out[]: array([ 0. , 0.33, 0.67, 0.67, 0.67, 0.67, 1. , 1. , 0.67, 0.33]) In []: proportions(matches, 5).round(2) Out[]: array([ 0.27, 0.36, 0.45, 0.55, 0.55, 0.55, 0.55, 0.55, 0.45, 0.36]) I'd like to ask about the possible better approaches, as it doesn't look very elegant to me, and I obviously don't know the implications or possible drawbacks of numpy arrays in some scenarios. the pattern for i in range(len(...)): is usually considered inadequate in python, but what should be used in this case as the indices are primarily needed? is something to be gained or lost using (x)range or np.arange as the python loop is (probably?) inevitable anyway? Here np.arange(.) will create a new array and potentially wasting memory if it's not otherwise used. IMO nothing wrong looping with xrange(.) (if you really need to loop ;). Is there some mor elegant way to check for the underflowing lower bound lo to replace with None? Is it significant, which container is used to collect the results of the computation in the python loop - i.e. python list or a numpy array? (Could possibly matplotlib cooperate better with either container?) And of course, are there maybe other things, which should be made better/differently? (using Numpy 1.6.2, python 2.7.3, win XP) My 2 cents, -eat Thanks in advance for any hints or suggestions
Re: [Numpy-discussion] array slicing questions
Hi, A partial answer to your questions: On Mon, Jul 30, 2012 at 10:33 PM, Vlastimil Brom vlastimil.b...@gmail.comwrote: Hi all, I'd like to ask for some hints or advice regarding the usage of numpy.array and especially slicing. I only recently tried numpy and was impressed by the speedup in some parts of the code, hence I suspect, that I might miss some other oportunities in this area. I currently use the following code for a simple visualisation of the search matches within the text, the arrays are generally much larger than the sample - the texts size is generally hundreds of kilobytes up to a few MB - with an index position for each character. First there is a list of spans(obtained form the regex match objects), the respective character indices in between these slices should be set to 1: import numpy characters_matches = numpy.zeros(10) matches_spans = numpy.array([[2,4], [5,9]]) for start, stop in matches_spans: ... characters_matches[start:stop] = 1 ... characters_matches array([ 0., 0., 1., 1., 0., 1., 1., 1., 1., 0.]) Is there maybe a way tu achieve this in a numpy-only way - without the python loop? (I got the impression, the powerful slicing capabilities could make it possible, bud haven't found this kind of solution.) In the next piece of code all the character positions are evaluated with their neighbourhood and a kind of running proportions of the matched text parts are computed (the checks_distance could be generally up to the order of the half the text length, usually less : check_distance = 1 floating_checks_proportions = [] for i in numpy.arange(len(characters_matches)): ... lo = i - check_distance ... if lo 0: ... lo = None ... hi = i + check_distance + 1 ... checked_sublist = characters_matches[lo:hi] ... proportion = (checked_sublist.sum() / (check_distance * 2 + 1.0)) ... floating_checks_proportions.append(proportion) ... floating_checks_proportions [0.0, 0.1, 0.3, 0.3, 0.3, 0.3, 1.0, 1.0, 0.3, 0.1] Define a function for proportions: from numpy import r_ from numpy.lib.stride_tricks import as_strided as ast def proportions(matches, distance= 1): cd, cd2p1, s= distance, 2* distance+ 1, matches.strides[0] # pad m= r_[[0.]* cd, matches, [0.]* cd] # create a suitable view m= ast(m, shape= (m.shape[0], cd2p1), strides= (s, s)) # average return m[:-2* cd].sum(1)/ cd2p1 and use it like: In []: matches Out[]: array([ 0., 0., 1., 1., 0., 1., 1., 1., 1., 0.]) In []: proportions(matches).round(2) Out[]: array([ 0. , 0.33, 0.67, 0.67, 0.67, 0.67, 1. , 1. , 0.67, 0.33]) In []: proportions(matches, 5).round(2) Out[]: array([ 0.27, 0.36, 0.45, 0.55, 0.55, 0.55, 0.55, 0.55, 0.45, 0.36]) I'd like to ask about the possible better approaches, as it doesn't look very elegant to me, and I obviously don't know the implications or possible drawbacks of numpy arrays in some scenarios. the pattern for i in range(len(...)): is usually considered inadequate in python, but what should be used in this case as the indices are primarily needed? is something to be gained or lost using (x)range or np.arange as the python loop is (probably?) inevitable anyway? Here np.arange(.) will create a new array and potentially wasting memory if it's not otherwise used. IMO nothing wrong looping with xrange(.) (if you really need to loop ;). Is there some mor elegant way to check for the underflowing lower bound lo to replace with None? Is it significant, which container is used to collect the results of the computation in the python loop - i.e. python list or a numpy array? (Could possibly matplotlib cooperate better with either container?) And of course, are there maybe other things, which should be made better/differently? (using Numpy 1.6.2, python 2.7.3, win XP) My 2 cents, -eat Thanks in advance for any hints or suggestions, regards, Vlastimil Brom ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.unique for one bi-dimensional array
Hi, On Wed, Jul 25, 2012 at 1:09 AM, Giuseppe Amatulli giuseppe.amatu...@gmail.com wrote: Hi, would like to identify unique pairs of numbers in two arrays o in one bi-dimensional array, and count the observation a_clean=array([4,4,5,4,4,4]) b_clean=array([3,5,4,4,3,4]) and obtain (4,3,2) (4,5,1) (5,4,1) (4,4,2) I solved with tow loops but off course there will be a fast solution Any idea? what about using np.unique for one bi-dimensional array? According the docs http://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html np.unique will flatten the input to 1D array. However, perhaps something like the following lines will help you: In []: lot= zip(a_clean, b_clean) In []: lot Out[]: [(4, 3), (4, 5), (5, 4), (4, 4), (4, 3), (4, 4)] In []: [[x, lot.count(x)] for x in set(lot)] Out[]: [[(4, 5), 1], [(5, 4), 1], [(4, 4), 2], [(4, 3), 2]] My 2 cents, -eat In bash I usually unique command thanks in advance Giuseppe -- Giuseppe Amatulli Web: www.spatial-ecology.net ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Problems understanding histogram2d
Hi, On Fri, Jul 20, 2012 at 6:42 PM, yogesh karpate yogeshkarp...@gmail.comwrote: I think since its a joint histogram, you need to have equal no. of data points and bins in both x and y. Makes sense that number of elements of data points (x, y) is equal. Perhaps the documentation like http://docs.scipy.org/doc/numpy-1.6.0/reference/generated/numpy.histogram2d.html could make this aspect more clearer. Especially confused is the requirement that *x* : array_like, shape(N,) and *y* : array_like, shape(M,), may indicate that N!= M could be a feasible case. A slightly better way would just state that x and y must be one dimensional and they must be equal length. My 2 cents, -eat On Fri, Jul 20, 2012 at 5:11 PM, Andreas Hilboll li...@hilboll.de wrote: Hi, I have a problem using histogram2d: from numpy import linspace, histogram2d bins_x = linspace(-180., 180., 360) bins_y = linspace(-90., 90., 180) data_x = linspace(-179.96875, 179.96875, 5760) data_y = linspace(-89.96875, 89.96875, 2880) histogram2d(data_x, data_y, (bins_x, bins_y)) AttributeError: The dimension of bins must be equal to the dimension of the sample x. I would expect histogram2d to return a 2d array of shape (360,180), which is full of 256s. What am I missing here? Cheers, Andreas. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Regards Yogesh Karpate ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Good way to develop numpy as popular choice!
Hi, On Fri, Jun 22, 2012 at 6:05 PM, Benjamin Root ben.r...@ou.edu wrote: On Fri, Jun 22, 2012 at 10:25 AM, Travis Oliphant tra...@continuum.iowrote: Accessing individual elements of NumPy arrays is slower than accessing individual elements of lists --- around 2.5x-3x slower.NumPy has to do more work to figure out what kind of indexing you are trying to do because of its flexibility. It also has to create the Python object to return. In contrast, the list approach already has the Python objects created and you are just returning pointers to them and there is much less flexibility in the kinds of indexing you can do. Simple timings show that a.item(i,j) is about 2x slower than list element access (but faster than a[i,j] which is about 2.5x to 3x slower). The slowness of a.item is due to the need to create the Python object to return (there are just raw bytes there) so it gives some idea of the relative cost of each part of the slowness of a[i,j]. Also, math on the array scalars returned from NumPy will be slower than math on integers and floats --- because NumPy re-uses the ufunc machinery which is not optimized at all for scalars. The take-away is that NumPy is built for doing vectorized operations on bytes of data. It is not optimized for doing element-by-element individual access.The right approach there is to just use lists (or use a version specialized for the kind of data in the lists that removes the boxing and unboxing). Here are my timings using IPython for NumPy indexing: 1-D: In[2]: a = arange(100) In [3]: %timeit [a.item(i) for i in xrange(100)] 1 loops, best of 3: 25.6 us per loop In [4]: %timeit [a[i] for i in xrange(100)] 1 loops, best of 3: 31.8 us per loop In [5]: al = a.tolist() In [6]: %timeit [al[i] for i in xrange(100)] 10 loops, best of 3: 10.6 us per loop 2-D: In [7]: a = arange(100).reshape(10,10) In [8]: al = a.tolist() In [9]: %timeit [al[i][j] for i in xrange(10) for j in xrange(10)] 1 loops, best of 3: 18.6 us per loop In [10]: %timeit [a[i,j] for i in xrange(10) for j in xrange(10)] 1 loops, best of 3: 44.4 us per loop In [11]: %timeit [a.item(i,j) for i in xrange(10) for j in xrange(10)] 1 loops, best of 3: 34.2 us per loop -Travis However, what is the timing/memory cost of converting a large numpy array that already exists into python list of lists? If all my processing before the munkres step is using NumPy, converting it into python lists has a cost. Also, your timings indicate only ~2x slowdown, while the timing tests done by eat show an order-of-magnitude difference. I suspect there is great room for improvement before even starting to worry about the array access issues. To create list of list from array is quite fast, like In []: A= rand(500, 500) In []: %timeit A.tolist() 100 loops, best of 3: 10.8 ms per loop Regards, -eat Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Good way to develop numpy as popular choice!
Heh, On Thu, Jun 21, 2012 at 6:03 PM, Robert Kern robert.k...@gmail.com wrote: On Thu, Jun 21, 2012 at 3:59 PM, bob tnur bobtnu...@gmail.com wrote: Hi all numpy fun;) This question is already posted in stackoverflow by some people, I am just thinking that numpy python will do this with trick;) I guess numpy will be every ones choice as its popularity increases. The question is herein: http://stackoverflow.com/questions/10074270/how-can-i-find-the-minimum-number-of-lines-needed-to-cover-all-the-zeros-in-a-2 My numpy solution for this is just $ pip install munkres munkres seems to be a pure python implementation ;-). FWIIW, There exists pure python implementation(s) to outperform munkresimplementation more than 200 times already with a 100x100 random cost matrix, based on shortest path variant of the Hungarian algorithm (more details of the algorithms can be found for example at http://www.assignmentproblems.com/). How the assignment algorithms are (typically) described, it actually may be quite a tedious job to create more performance ones utilizing numpy arrays instead of lists of lists. My 2 cents, -eat http://pypi.python.org/pypi/munkres -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] convert any non square matrix in to square matrix using numpy
Hi, On Mon, Jun 18, 2012 at 6:55 PM, bob tnur bobtnu...@gmail.com wrote: Hi, how I can convert (by adding zero) of any non-square numpy matrix in to square matrix using numpy? then how to find the minimum number in each row except the zeros added(for making square matrix)? ;) Perhaps something like this: In []: def make_square(A): ..: s= A.shape ..: if s[0] s[1]: : return r_[A, zeros((s[1]- s[0], s[1]), dtype= A.dtype)] ..: return c_[A, zeros((s[0], s[0]- s[1]), dtype= A.dtype)] ..: In []: A= rand(4, 2) In []: make_square(A) Out[]: array([[ 0.76109774, 0.42980812, 0., 0.], [ 0.11810978, 0.59622975, 0., 0.], [ 0.54991376, 0.29315485, 0., 0.], [ 0.78182313, 0.3828001 , 0., 0.]]) In []: make_square(A.T) Out[]: array([[ 0.76109774, 0.11810978, 0.54991376, 0.78182313], [ 0.42980812, 0.59622975, 0.29315485, 0.3828001 ], [ 0., 0., 0., 0.], [ 0., 0., 0., 0.]]) will help you. My 2 cents, -eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast access and normalizing of ndarray slices
Hi, On Mon, Jun 4, 2012 at 12:44 AM, srean srean.l...@gmail.com wrote: Hi Wolfgang, I think you are looking for reduceat( ), in particular add.reduceat() Indeed OP could utilize add.reduceat(...), like: # tst.py import numpy as np def reduce(data, lengths): ind, ends= np.r_[lengths, lengths], lengths.cumsum() ind[::2], ind[1::2]= ends- lengths, ends return np.add.reduceat(np.r_[data, 0], ind)[::2] def normalize(data, lengths): return data/ np.repeat(reduce(data, lengths), lengths) def gen(par): lengths= np.random.randint(*par) return np.random.randn(lengths.sum()), lengths if __name__ == '__main__': data= np.array([1, 2, 1, 2, 3, 4, 1, 2, 3], dtype= float) lengths= np.array([2, 4, 3]) print reduce(data, lengths) print normalize(data, lengths).round(2) Resulting: In []: %run tst [ 3. 10. 6.] [ 0.33 0.67 0.1 0.2 0.3 0.4 0.17 0.33 0.5 ] Fast enough: In []: data, lengths= gen([5, 15, 5e4]) In []: data.size Out[]: 476028 In []: %timeit normalize(data, lengths) 10 loops, best of 3: 29.4 ms per loop My 2 cents, -eat -- srean On Thu, May 31, 2012 at 12:36 AM, Wolfgang Kerzendorf wkerzend...@gmail.com wrote: Dear all, I have an ndarray which consists of many arrays stacked behind each other (only conceptually, in truth it's a normal 1d float64 array). I have a second array which tells me the start of the individual data sets in the 1d float64 array and another one which tells me the length. Example: data_array = (conceptually) [[1,2], [1,2,3,4], [1,2,3]] = in reality [1,2,1,2,3,4,1,2,3, dtype=float64] start_pointer = [0, 2, 6] length_data = [2, 4, 3] I now want to normalize each of the individual data sets. I wrote a simple for loop over the start_pointer and length data grabbed the data and normalized it and wrote it back to the big array. That's slow. Is there an elegant numpy way to do that? Do I have to go the cython way? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] why not zerodivision error?
Hi, On Sun, May 20, 2012 at 10:21 AM, Chao YUE chaoyue...@gmail.com wrote: Dear all, could anybody give one sentence about this? why in the loop I didn't get zerodivision error by when I explicitly do this, I get a zerodivision error? thanks. In [7]: for i in np.arange(-10,10): print 1./i ...: -0.1 -0. -0.125 -0.142857142857 -0.1667 -0.2 -0.25 -0. -0.5 -1.0 inf 1.0 0.5 0. 0.25 0.2 0.1667 0.142857142857 0.125 0. In [8]: 1/0. --- ZeroDivisionError Traceback (most recent call last) /mnt/f/data/DROUGTH/ipython-input-8-7e0bf5b37da6 in module() 1 1/0. ZeroDivisionError: float division by zero In [9]: 1./0. --- ZeroDivisionError Traceback (most recent call last) /mnt/f/data/DROUGTH/ipython-input-9-3543596c47ff in module() 1 1./0. ZeroDivisionError: float division by zero In [10]: 1./0 --- ZeroDivisionError Traceback (most recent call last) /mnt/f/data/DROUGTH/ipython-input-10-523760448f92 in module() 1 1./0 ZeroDivisionError: float division by zero You may like to read more on here http://docs.scipy.org/doc/numpy/reference/generated/numpy.seterr.html#numpy.seterr So, for your specific example: In []: a= arange(-10, 10) In []: 1./ a Out[]: array([-0.1 , -0., -0.125 , -0.14285714, -0.1667, -0.2 , -0.25 , -0., -0.5 , -1., inf, 1., 0.5 , 0., 0.25 , 0.2 , 0.1667, 0.14285714, 0.125 , 0.]) In []: seterr(divide= 'raise') In []: 1./ a Traceback (most recent call last): File ipython console, line 1, in module FloatingPointError: divide by zero encountered in divide My 2 cents, -eat Chao -- *** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] (no subject)
Hi, On Fri, Apr 20, 2012 at 9:15 PM, Andre Martel soucoupevola...@yahoo.comwrote: What would be the best way to remove the maximum from a cube and collapse the remaining elements along the z-axis ? For example, I want to reduce Cube to NewCube: Cube array([[[ 13, 2, 3, 42], [ 5, 100, 7, 8], [ 9, 1, 11, 12]], [[ 25, 4, 15, 1], [ 17, 30, 9, 20], [ 21, 2, 23, 24]], [[ 1, 2, 27, 28], [ 29, 18, 31, 32], [ -1, 3, 35, 4]]]) NewCube array([[[ 13, 2, 3, 1], [ 5, 30, 7, 8], [ 9, 1, 11, 12]], [[ 1, 2, 15, 28], [ 17, 18, 9, 20], [ -1, 2, 23, 4]]]) I tried with argmax() and then roll() and delete() but these all work on 1-D arrays only. Thanks. Perhaps it would be more straightforward to process via 2D-arrays, like: In []: C Out[]: array([[[ 13, 2, 3, 42], [ 5, 100, 7, 8], [ 9, 1, 11, 12]], [[ 25, 4, 15, 1], [ 17, 30, 9, 20], [ 21, 2, 23, 24]], [[ 1, 2, 27, 28], [ 29, 18, 31, 32], [ -1, 3, 35, 4]]]) In []: C_in= C.reshape(3, -1).copy() In []: ndx= C_in.argmax(0) In []: C_out= C_in[:2, :] In []: C_out[:, ndx== 0]= C_in[1:, ndx== 0] In []: C_out[1, ndx== 1]= C_in[2, ndx== 1] In []: C_out.reshape(2, 3, 4) Out[]: array([[[13, 2, 3, 1], [ 5, 30, 7, 8], [ 9, 1, 11, 12]], [[ 1, 2, 15, 28], [17, 18, 9, 20], [-1, 2, 23, 4]]]) My 2 cents, -eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] problem with vectorized difference equation
Hi, On Sat, Apr 7, 2012 at 1:27 AM, Sameer Grover sameer.grove...@gmail.comwrote: On Saturday 07 April 2012 02:51 AM, Francesco Barale wrote: Hello Sameer, Thank you very much for your reply. My goal was to try to speed up the loop describing the accumulator. In the (excellent) book I was mentioning in my initial post I could find one example that seemed to match what I was trying to do. Basically, it is said that a loop of the following kind: n = size(u)-1 for i in xrange(1,n,1): u_new[i] = u[i-1] + u[i] + u[i+1] should be equivalent to: u[1:n] = u[0:n-1] + u[1:n] + u[i+1] This example is correct. What I was trying to point out was that the single expression y[1:n] = y[0:n-1] + u[1:n] will iterate over the array but will not accumulate. It will add y[0:n-1] to u[1:n] and assign the result to y[1:n]. For example, y = [1,2,3,4] u = [5,6,7,8] Then y[0:n-1] = [1,2,3] and u[1:n]=[6,7,8] The statement y[1:n] = y[0:n-1] + u[1:n] implies y[1:n] = [6+1,7+2,8+3] = [7,9,11] yielding y = [1,7,9,11] FWIIFO, if assignment in loop like this ever makes any sense (which I doubt), whereas the code: for i in range(0,n-1): y[i+1] = y[i] + u[i+1] then it can be captured in a function, like In []: def s0(y, u): ..: yn= y.copy() ..: for k in xrange(y.size- 1): ..: yn[k+ 1]= yn[k]+ u[k+ 1] ..: return yn ..: and now this function can be easily transformed to utilize cumsum, like In []: def s1(y, u): ..: un= u.copy() ..: un[0]= y[0] ..: return cumsum(un) ..: thus In []: y, u= rand(1e5), rand(1e5) In []: allclose(s0(y, u), s1(y, u)) Out[]: True and definitely this transformation will outperform a plain python loop In []: timeit s0(y, u) 10 loops, best of 3: 122 ms per loop In []: timeit s1(y, u) 100 loops, best of 3: 2.16 ms per loop In []: 122/ 2.16 Out[]: 56.48148148148148 My 2 cents, -eat will accumulate and give y = [1,7,14,22] Sameer Am I missing something? Regards, Francesco Sameer Grover wrote: On Saturday 07 April 2012 12:14 AM, francesco82 wrote: Hello everyone, After reading the very good post http://technicaldiscovery.blogspot.com/2011/06/speeding-up-python-numpy-cython-and.html and the book by H. P. Langtangen 'Python scripting for computational science' I was trying to speed up the execution of a loop on numpy arrays being used to describe a simple difference equation. The actual code I am working on involves some more complicated equations, but I am having the same exact behavior as described below. To test the improvement in speed I wrote the following in vect_try.py: #!/usr/bin/python import numpy as np import matplotlib.pyplot as plt dt = 0.02 #time step time = np.arange(0,2,dt) #time array u = np.sin(2*np.pi*time) #input signal array def vect_int(u,y): #vectorized function n = u.size y[1:n] = y[0:n-1] + u[1:n] return y def sc_int(u,y): #scalar function y = y + u return y def calc_vect(u, func=vect_int): out = np.zeros(u.size) for i in xrange(u.size): out = func(u,out) return out def calc_sc(u, func=sc_int): out = np.zeros(u.size) for i in xrange(u.size-1): out[i+1] = sc_int(u[i+1],out[i]) return out To verify the execution time I've used the timeit function in Ipython: import vect_try as vt timeit vt.calc_vect(vt.u) -- 1000 loops, best of 3: 494 us per loop timeit vt.calc_sc(vt.u) --1 loops, best of 3: 92.8 us per loop As you can see the scalar implementation looping one item at the time (calc_sc) is 494/92.8~5.3 times faster than the vectorized one (calc_vect). My problem consists in the fact that I need to iterate the execution of calc_vect in order for it to operate on all the elements of the input array. If I execute calc_vect only once, it will only operate on the first slice of the vectors leaving the remaining untouched. My understanding was that the vector expression y[1:n] = y[0:n-1] + u[1:n] was supposed to iterate over all the array, but that's not happening for me. Can anyone tell me what I am doing wrong? Thanks! Francesco 1. By vectorizing, we mean replacing a loop with a single expression. In your program, both the scalar and vector implementations (calc_vect and calc_sc) have a loop each. This isn't going to make anything faster. The vectorized implementation is just a convoluted way of achieving the same result and is slower. 2. The expression y[1:n] = y[0:n-1] + u[1:n] is /not/ equivalent to the following loop for i in range(0,n-1): y[i+1] = y[i] + u[i+1] It is equivalent to something like z = np.zeros(n-1) for i in range(0,n-1): z[i] = y[i] + u[i+1] y[1:n] = z i.e., the RHS is computed in totality and then assigned to the LHS. This is how array operations work even
Re: [Numpy-discussion] Numpy Memory Error with corrcoef
Hi, On Tue, Mar 27, 2012 at 12:12 PM, Nicole Stoffels nicole.stoff...@forwind.de wrote: ** Dear all, I get the following memory error while running my program: *Traceback (most recent call last): File /home/nistl/Software/Netzbetreiber/FLOW/src/MemoryError_Debug.py, line 9, in module correlation = corrcoef(data_records) File /usr/lib/python2.7/dist-packages/numpy/lib/function_base.py, line 1992, in corrcoef c = cov(x, y, rowvar, bias) File /usr/lib/python2.7/dist-packages/numpy/lib/function_base.py, line 1973, in cov return (dot(X, X.T.conj()) / fact).squeeze() MemoryError* Here an easy example how to reproduce the error: *#!/usr/bin/env python2.7 from pybinsel import open from numpy import * if __name__ == '__main__': data_records = random.random((459375, 24)) correlation = corrcoef(data_records) *My real data has the same dimension. Is this a size problem of the array or did I simply make a mistake in the application of corrcoef? I hope that you can help me! Thanks! As other ones has explained this approach yields an enormous matrix. However, if I have understood your problem correctly you could implement a helper class to iterate over all of your observations. Something like along the lines (although it will take hours? with your data size) to iterate over all correlations: A helper class for correlations between observations. import numpy as np class Correlations(object): def __init__(self, data): self.m= data.shape[0] # compatible with corrcoef self.scale= self.m- 1 self.data= data- np.mean(data, 1)[:, None] # but you may actually need to scale and translate data # more application speficic manner self.var= (self.data** 2.).sum(1)/ self.scale def obs_kth(self, k): c= np.dot(self.data, self.data[k])/ self.scale return c/ (self.var[k]* self.var)** .5 def obs_iterate(self): for k in xrange(self.m): yield self.obs_kth(k) if __name__ == '__main__': data= np.random.randn(5, 3) print np.corrcoef(data).round(3) print c= Correlations(data) print np.array([p for p in c.obs_iterate()]).round(3) My 2 cents, -eat Best regards, Nicole Stoffels -- Dipl.-Met. Nicole Stoffels Wind Power Forecasting and Simulation ForWind - Center for Wind Energy Research Institute of Physics Carl von Ossietzky University Oldenburg Ammerländer Heerstr. 136 D-26129 Oldenburg Tel: +49(0)441 798 - 5079 Fax: +49(0)441 798 - 5099 Web : www.ForWind.de Email: nicole.stoff...@forwind.de ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Want to eliminate direct for-loop
Hi, On Sat, Feb 11, 2012 at 10:56 PM, Dinesh B Vadhia dineshbvad...@hotmail.com wrote: ** Could the following be written without the direct for-loop? import numpy # numpy vector r of any data type and length, eg. r = numpy.ones(25, dtype='int') # s is a list of values (of any data type), eg. s = [47, 27, 67] # c is a list of (variable length) lists where the sub-list elements are index values of r and len(s) = len(c), eg. c = [[3, 6, 9], [6, 11, 19, 24], [4, 9, 11, 21 ]] # for each element in each sub-list c, add corresponding s value to the index value in r, eg. for i, j in enumerate(c): r[j] += s[i] So, we get: r[[3, 6, 9]] += s[0] = 1 + 47 = 48 r[[6, 11, 19, 24]] += s[1] = 1 + 27 = 28 r[[4, 9, 11, 21]] += s[2] = 1 + 67 = 68 ie. r = array([ 1, 1, 1, 95, 68, 1, 122, 1, 1, 162, 1, 95, 1, 1, 1, 1, 1, 1, 1, 28, 1, 68, 1, 1, 28]) Thank-you! Could you describe more detailed manner about why you want to get rid of that loop? Performance wise? If so, do you have profiled what's the bottleneck? Please provide also a more detailed description of your problem, since now your current spec seems to yield: r= array([ 1, 1, 1, 48, 68, 1, 75, 1, 1, 115, 1, 95, 1, 1, 1, 1, 1, 1, 1, 28, 1, 68, 1, 1, 28]) My 2 cents, -eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.arange() error?
Hi, On Thu, Feb 9, 2012 at 9:47 PM, Eric Firing efir...@hawaii.edu wrote: On 02/09/2012 09:20 AM, Drew Frank wrote: Eric Firingefiringat hawaii.edu writes: On 02/08/2012 09:31 PM, teomat wrote: Hi, Am I wrong or the numpy.arange() function is not correct 100%? Try to do this: In [7]: len(np.arange(3.1, 4.9, 0.1)) Out[7]: 18 In [8]: len(np.arange(8.1, 9.9, 0.1)) Out[8]: 19 I would expect the same result for each command. Not after more experience with the wonders of floating point! Nice-looking decimal numbers often have long, drawn-out, inexact floating point (base 2) representations. That leads to exactly this sort of problem. numpy.linspace is provided to help get around some of these surprises; or you can use an integer sequence and then scale and shift it. Eric All the best I also found this surprising -- not because I lack experience with floating point, but because I do have experience with MATLAB. In MATLAB, the corresponding operation 3.1:0.1:4.9 has length 19 because of an explicit tolerance parameter used in the implmentation ( http://www.mathworks.com/support/solutions/en/data/1-4FLI96/index.html?solution=1-4FLI96 ). Of course, NumPy is not MATLAB :). That said, I prefer the MATLAB behavior in this case -- even though it has a bit of a magic feel to it, I find it hard to imagine code that operates correctly given the Python semantics and incorrectly under MATLAB's. Thoughts? You raise a good point. Neither arange nor linspace provides a close equivalent to the nice behavior of the Matlab colon, even though that is often what one really wants. Adding this, either via an arange kwarg, a linspace kwarg, or a new function, seems like a good idea. Maybe this issue is raised also earlier, but wouldn't it be more consistent to let arange operate only with integers (like Python's range) and let linspace handle the floats as well? My 2 cents, eat Eric ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] avoiding loops when downsampling arrays
Hi This is elegant and very fast as well! On Tue, Feb 7, 2012 at 2:57 PM, Sturla Molden stu...@molden.no wrote: On 06.02.2012 22:27, Sturla Molden wrote: # Make a 4D view of this data, such that b[i,j] # is a 2D block with shape (4,4) (e.g. b[0,0] is # the same as a[:4, :4]). b = as_strided(a, shape=(a.shape[0]/4, a.shape[1]/4, 4, 4), strides=(4*a.strides[0], 4*a.strides[1], a.strides[0], a.strides[1])) Yes :-) Being used to Fortran (and also MATLAB) this is the kind of mapping It never occurs for me to think about. What else but NumPy is flexible enough to do this? :-) Actually, using as_strided is not needed. We can just reshape like this: (m,n) --- (m//4, 4, n//4, 4) and then use np.any along the two length-4 dimensions. m,n = data.shape cond = lamda x : (x = t1) (x = t2) I guess you meant here cond= lambda x: (x= t1) (x= t2) x = cond(data).reshape((m//4, 4, n//4, 4)) found = np.any(np.any(x, axis=1), axis=2) Regards, eat Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] avoiding loops when downsampling arrays
Hi, On Mon, Feb 6, 2012 at 9:16 PM, Moroney, Catherine M (388D) catherine.m.moro...@jpl.nasa.gov wrote: Hello, I have to write a code to downsample an array in a specific way, and I am hoping that somebody can tell me how to do this without the nested do-loops. Here is the problem statement: Segment a (MXN) array into 4x4 squares and set a flag if any of the pixels in that 4x4 square meet a certain condition. Here is the code that I want to rewrite avoiding loops: shape_out = (data_in.shape[0]/4, data_in.shape[1]/4) found = numpy.zeros(shape_out).astype(numpy.bool) for i in xrange(0, shape_out[0]): for j in xrange(0, shape_out[1]): excerpt = data_in[i*4:(i+1)*4, j*4:(j+1)*4] mask = numpy.where( (excerpt = t1) (excerpt = t2), True, False) if (numpy.any(mask)): found[i,j] = True Thank you for any hints and education! Catherine Following Warrens answer a slight demonstration of code like this: import numpy as np def ds_0(data_in, t1= 1, t2= 4): shape_out= (data_in.shape[0]/ 4, data_in.shape[1]/ 4) found= np.zeros(shape_out).astype(np.bool) for i in xrange(0, shape_out[0]): for j in xrange(0, shape_out[1]): excerpt= data_in[i* 4: (i+ 1)* 4, j* 4: (j+ 1)* 4] mask= np.where((excerpt= t1) (excerpt= t2), True, False) if (np.any(mask)): found[i, j]= True return found # with stride_tricks you may cook up something like this: from numpy.lib.stride_tricks import as_strided as ast def _ss(dt, ds, s): return {'shape': (ds[0]/ s[0], ds[1]/ s[1])+ s, 'strides': (s[0]* dt[0], s[1]* dt[1])+ dt} def _view(D, shape= (4, 4)): return ast(D, **_ss(D.strides, D.shape, shape)) def ds_1(data_in, t1= 1, t2= 4): # return _view(data_in) excerpt= _view(data_in) mask= np.where((excerpt= t1) (excerpt= t2), True, False) return mask.sum(2).sum(2).astype(np.bool) if __name__ == '__main__': from numpy.random import randint r= randint(777, size= (64, 288)); print r print np.allclose(ds_0(r), ds_1(r)) My 2 cents, eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] avoiding loops when downsampling arrays
Hi, Sorry for my latest post, hands way too quick ;( On Mon, Feb 6, 2012 at 9:16 PM, Moroney, Catherine M (388D) catherine.m.moro...@jpl.nasa.gov wrote: Hello, I have to write a code to downsample an array in a specific way, and I am hoping that somebody can tell me how to do this without the nested do-loops. Here is the problem statement: Segment a (MXN) array into 4x4 squares and set a flag if any of the pixels in that 4x4 square meet a certain condition. Here is the code that I want to rewrite avoiding loops: shape_out = (data_in.shape[0]/4, data_in.shape[1]/4) found = numpy.zeros(shape_out).astype(numpy.bool) for i in xrange(0, shape_out[0]): for j in xrange(0, shape_out[1]): excerpt = data_in[i*4:(i+1)*4, j*4:(j+1)*4] mask = numpy.where( (excerpt = t1) (excerpt = t2), True, False) if (numpy.any(mask)): found[i,j] = True Thank you for any hints and education! Following closely with Warrens answer a slight demonstration of code like this: import numpy as np def ds_0(data_in, t1= 1, t2= 4): shape_out= (data_in.shape[0]/ 4, data_in.shape[1]/ 4) found= np.zeros(shape_out).astype(np.bool) for i in xrange(0, shape_out[0]): for j in xrange(0, shape_out[1]): excerpt= data_in[i* 4: (i+ 1)* 4, j* 4: (j+ 1)* 4] mask= np.where((excerpt= t1) (excerpt= t2), True, False) if (np.any(mask)): found[i, j]= True return found # with stride_tricks you may cook up something like this: from numpy.lib.stride_tricks import as_strided as ast def _ss(dt, ds, s): return {'shape': (ds[0]/ s[0], ds[1]/ s[1])+ s, 'strides': (s[0]* dt[0], s[1]* dt[1])+ dt} def _view(D, shape= (4, 4)): return ast(D, **_ss(D.strides, D.shape, shape)) def ds_1(data_in, t1= 1, t2= 4): excerpt= _view(data_in) mask= np.where((excerpt= t1) (excerpt= t2), True, False) return mask.sum(2).sum(2).astype(np.bool) if __name__ == '__main__': from numpy.random import randint r= randint(777, size= (64, 288)); print r print np.allclose(ds_0(r), ds_1(r)) and when run, it will yield like: In []: run dsa [[ 60 470 521 ..., 147 435 295] [246 127 662 ..., 718 525 256] [354 384 205 ..., 225 364 239] ..., [277 428 201 ..., 460 282 433] [ 27 407 130 ..., 245 346 309] [649 157 153 ..., 316 613 570]] True and compared in performance wise: In []: %timeit ds_0(r) 10 loops, best of 3: 56.3 ms per loop In []: %timeit ds_1(r) 100 loops, best of 3: 2.17 ms per loop My 2 cents, eat Catherine ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Unrealistic expectations of class Polynomial or a bug?
On Sat, Jan 28, 2012 at 11:14 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, Jan 28, 2012 at 11:15 AM, eat e.antero.ta...@gmail.com wrote: Hi, Short demonstration of the issue: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: from numpy.polynomial import Polynomial as Poly In []: def p_tst(c): ..: p= Poly(c) ..: r= p.roots() ..: return sort(abs(p(r))) ..: Now I would expect a result more like: In []: p_tst(randn(123))[-3:] Out[]: array([ 3.41987203e-07, 2.82123675e-03, 2.82123675e-03]) be the case, but actually most result seems to be more like: In []: p_tst(randn(123))[-3:] Out[]: array([ 9.09325898e+13, 9.09325898e+13, 1.29387029e+72]) In []: p_tst(randn(123))[-3:] Out[]: array([ 8.60862087e-11, 8.60862087e-11, 6.58784520e+32]) In []: p_tst(randn(123))[-3:] Out[]: array([ 2.00545673e-09, 3.25537709e+32, 3.25537709e+32]) In []: p_tst(randn(123))[-3:] Out[]: array([ 3.22753481e-04, 1.87056454e+00, 1.87056454e+00]) In []: p_tst(randn(123))[-3:] Out[]: array([ 2.98556327e+08, 2.98556327e+08, 8.23588003e+12]) So, does this phenomena imply that - I'm testing with too high order polynomials (if so, does there exists a definite upper limit of polynomial order I'll not face this issue) or - it's just the 'nature' of computations with float values (if so, probably I should be able to tackle this regardless of the polynomial order) or - it's a nasty bug in class Polynomial It's a defect. You will get all the roots and the number will equal the degree. I haven't decided what the best way to deal with this is, but my thoughts have trended towards specifying an interval with the default being the domain. If you have other thoughts I'd be glad for the feedback. For the problem at hand, note first that you are specifying the coefficients, not the roots as was the case with poly1d. Second, as a rule of thumb, plain old polynomials will generally only be good for degree 22 due to being numerically ill conditioned. If you are really looking to use high degrees, Chebyshev or Legendre will work better, although you will probably need to explicitly specify the domain. If you want to specify the polynomial using roots, do Poly.fromroots(...). Third, for the high degrees you are probably screwed anyway for degree 123, since the accuracy of the root finding will be limited, especially for roots that can cluster, and any root that falls even a little bit outside the interval [-1,1] (the default domain) is going to evaluate to a big number simply because the polynomial is going to h*ll at a rate you wouldn't believe ;) For evenly spaced roots in [-1, 1] and using Chebyshev polynomials, things look good for degree 50, get a bit loose at degree 75 but can be fixed up with one iteration of Newton, and blow up at degree 100. I think that's pretty good, actually, doing better would require a lot more work. There are some zero finding algorithms out there that might do better if someone wants to give it a shot. In [20]: p = Cheb.fromroots(linspace(-1, 1, 50)) In [21]: sort(abs(p(p.roots( Out[21]: array([ 6.20385459e-25, 1.65436123e-24, 2.06795153e-24, 5.79026429e-24, 5.89366186e-24, 6.44916482e-24, 6.44916482e-24, 6.77254127e-24, 6.97933642e-24, 7.25459208e-24, 1.00295649e-23, 1.37391414e-23, 1.37391414e-23, 1.63368171e-23, 2.39882378e-23, 3.30872245e-23, 4.38405725e-23, 4.49502653e-23, 4.49502653e-23, 5.58346913e-23, 8.35452419e-23, 9.38407760e-23, 9.38407760e-23, 1.03703218e-22, 1.03703218e-22, 1.23249911e-22, 1.75197880e-22, 1.75197880e-22, 3.07711188e-22, 3.09821786e-22, 3.09821786e-22, 4.56625520e-22, 4.56625520e-22, 4.69638303e-22, 4.69638303e-22, 5.96448724e-22, 5.96448724e-22, 1.24076485e-21, 1.24076485e-21, 1.59972624e-21, 1.59972624e-21, 1.62930347e-21, 1.62930347e-21, 1.73773328e-21, 1.73773328e-21, 1.87935435e-21, 2.30287083e-21, 2.48815928e-21, 2.85411753e-21, 2.85411753e-21]) Thanks, for a very informative feedback. I'll study those orthogonal polynomials more detail. Regards, - eat Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Unrealistic expectations of class Polynomial or a bug?
Hi, Short demonstration of the issue: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: from numpy.polynomial import Polynomial as Poly In []: def p_tst(c): ..: p= Poly(c) ..: r= p.roots() ..: return sort(abs(p(r))) ..: Now I would expect a result more like: In []: p_tst(randn(123))[-3:] Out[]: array([ 3.41987203e-07, 2.82123675e-03, 2.82123675e-03]) be the case, but actually most result seems to be more like: In []: p_tst(randn(123))[-3:] Out[]: array([ 9.09325898e+13, 9.09325898e+13, 1.29387029e+72]) In []: p_tst(randn(123))[-3:] Out[]: array([ 8.60862087e-11, 8.60862087e-11, 6.58784520e+32]) In []: p_tst(randn(123))[-3:] Out[]: array([ 2.00545673e-09, 3.25537709e+32, 3.25537709e+32]) In []: p_tst(randn(123))[-3:] Out[]: array([ 3.22753481e-04, 1.87056454e+00, 1.87056454e+00]) In []: p_tst(randn(123))[-3:] Out[]: array([ 2.98556327e+08, 2.98556327e+08, 8.23588003e+12]) So, does this phenomena imply that - I'm testing with too high order polynomials (if so, does there exists a definite upper limit of polynomial order I'll not face this issue) or - it's just the 'nature' of computations with float values (if so, probably I should be able to tackle this regardless of the polynomial order) or - it's a nasty bug in class Polynomial Regards, eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
Hi, Oddly, but numpy 1.6 seems to behave more consistent manner: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: d= np.load('data.npy') In []: d.dtype Out[]: dtype('float32') In []: d.mean() Out[]: 3045.74718 In []: d.mean(dtype= np.float32) Out[]: 3045.74718 In []: d.mean(dtype= np.float64) Out[]: 3045.747251076416 In []: (d- d.min()).mean()+ d.min() Out[]: 3045.7472508750002 In []: d.mean(axis= 0).mean() Out[]: 3045.74724 In []: d.mean(axis= 1).mean() Out[]: 3045.74724 Or does the results of calculations depend more on the platform? My 2 cents, eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
Hi On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina kathleen.m.tac...@nasa.gov wrote: ** I found something similar, with a very simple example. On 64-bit linux, python 2.7.2, numpy development version: In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) In [23]: a.mean() Out[23]: 4034.16357421875 In [24]: np.version.full_version Out[24]: '2.0.0.dev-55472ca' But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives: a = np.ones((1024,1024),dtype=np.float32) a.mean() 4000.0 np.version.full_version '1.6.1' This indeed looks very nasty, regardless of whether it is a version or platform related problem. -eat On Tue, 2012-01-24 at 17:12 -0600, eat wrote: Hi, Oddly, but numpy 1.6 seems to behave more consistent manner: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: d= np.load('data.npy') In []: d.dtype Out[]: dtype('float32') In []: d.mean() Out[]: 3045.74718 In []: d.mean(dtype= np.float32) Out[]: 3045.74718 In []: d.mean(dtype= np.float64) Out[]: 3045.747251076416 In []: (d- d.min()).mean()+ d.min() Out[]: 3045.7472508750002 In []: d.mean(axis= 0).mean() Out[]: 3045.74724 In []: d.mean(axis= 1).mean() Out[]: 3045.74724 Or does the results of calculations depend more on the platform? My 2 cents, eat -- -- Kathleen M. Tacina NASA Glenn Research Center MS 5-10 21000 Brookpark Road Cleveland, OH 44135 Telephone: (216) 433-6660 Fax: (216) 433-5802 -- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Would it be possible to enhance np.unique(.) with a keyword kind
Hi, Especially when the keyword return_index of np.unique(.) is specified to be True, would it in general also be reasonable to be able to specify the sorting algorithm of the underlying np.argsort(.)? The rationale is that (for at least some of my cases) higher level algorithms seems to be too over complicated unless I'm not able to request a stable sorting order from np.unique(.) (like np.unique(., return_index= True, kind= 'mergesort'). (FWIW, I apparently do have a working local hack for this kind of functionality, but without extensive testing of 'all' corner cases). Thanks, eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Would it be possible to enhance np.unique(.) with a keyword kind
Hi, On Tue, Dec 20, 2011 at 2:33 AM, josef.p...@gmail.com wrote: On Mon, Dec 19, 2011 at 6:27 PM, eat e.antero.ta...@gmail.com wrote: Hi, Especially when the keyword return_index of np.unique(.) is specified to be True, would it in general also be reasonable to be able to specify the sorting algorithm of the underlying np.argsort(.)? The rationale is that (for at least some of my cases) higher level algorithms seems to be too over complicated unless I'm not able to request a stable sorting order from np.unique(.) (like np.unique(., return_index= True, kind= 'mergesort'). (FWIW, I apparently do have a working local hack for this kind of functionality, but without extensive testing of 'all' corner cases). Just to understand this: Is the return_index currently always the first occurrence or random? No, for current implementation it's not always the first occurrence returned. AFAIK, the only stable algorithm to provide this is ' mergesort' and that's why I'll like to have a keyword 'kind' to propagate down to then internals. I haven't found a use for return_index yet (but use return_inverse a lot), but if we are guaranteed to get the first instance, then this could be very useful. I think that 'return_inverse' will suffer of the same nondeterministic behavior as well. Thanks, eat Josef Thanks, eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Would it be possible to enhance np.unique(.) with a keyword kind
Hi, On Tue, Dec 20, 2011 at 3:41 AM, josef.p...@gmail.com wrote: On Mon, Dec 19, 2011 at 8:16 PM, eat e.antero.ta...@gmail.com wrote: Hi, On Tue, Dec 20, 2011 at 2:33 AM, josef.p...@gmail.com wrote: On Mon, Dec 19, 2011 at 6:27 PM, eat e.antero.ta...@gmail.com wrote: Hi, Especially when the keyword return_index of np.unique(.) is specified to be True, would it in general also be reasonable to be able to specify the sorting algorithm of the underlying np.argsort(.)? The rationale is that (for at least some of my cases) higher level algorithms seems to be too over complicated unless I'm not able to request a stable sorting order from np.unique(.) (like np.unique(., return_index= True, kind= 'mergesort'). (FWIW, I apparently do have a working local hack for this kind of functionality, but without extensive testing of 'all' corner cases). Just to understand this: Is the return_index currently always the first occurrence or random? No, for current implementation it's not always the first occurrence returned. AFAIK, the only stable algorithm to provide this is ' mergesort' and that's why I'll like to have a keyword 'kind' to propagate down to then internals. Thanks, then I'm all in favor of mergesort. I haven't found a use for return_index yet (but use return_inverse a lot), but if we are guaranteed to get the first instance, then this could be very useful. I think that 'return_inverse' will suffer of the same nondeterministic behavior as well. I don't think so, because there is a unique mapping from observations to unique items. But (source code of 1.6.1) indicates that keywords 'return_inverse' are 'return_index' are related, indeed. Just my 2 cents eat Josef Thanks, eat Josef Thanks, eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] build numpy matrix out of smaller matrix
Hi, On Thu, Dec 1, 2011 at 6:52 PM, jonasr jonas.rueb...@web.de wrote: Hi, is there any possibility to define a numpy matrix, via a smaller given matrix, i.e. in matlab i can do this like a=[1 2 ; 3 4 ] A=[a a ; a a ] so that i finally get A=[ [1,2,1,2] [3,4,3,4] [1,2,1,2] [3,4,3,4]] i tried different things on numpy which didn't work any ideas ? Perhaps something like this: In []: a= np.array([[1, 2], [3, 4]]) In []: np.c_[[a, a], [a, a]] Out[]: array([[[1, 2, 1, 2], [3, 4, 3, 4]], [[1, 2, 1, 2], [3, 4, 3, 4]]]) Regards, eat thank you -- View this message in context: http://old.nabble.com/build-numpy-matrix-out-of-smaller-matrix-tp32896004p32896004.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] build numpy matrix out of smaller matrix
Oops, slightly incorrect answer, but anyway my intention was more along the lines: In []: a= np.array([[1, 2], [3, 4]]) In []: np.c_[[a, a], [a, a]].reshape(4, 4) Out[]: array([[1, 2, 1, 2], [3, 4, 3, 4], [1, 2, 1, 2], [3, 4, 3, 4]]) On Thu, Dec 1, 2011 at 8:16 PM, josef.p...@gmail.com wrote: On Thu, Dec 1, 2011 at 12:26 PM, Benjamin Root ben.r...@ou.edu wrote: On Thu, Dec 1, 2011 at 10:52 AM, jonasr jonas.rueb...@web.de wrote: Hi, is there any possibility to define a numpy matrix, via a smaller given matrix, i.e. in matlab i can do this like a=[1 2 ; 3 4 ] A=[a a ; a a ] so that i finally get A=[ [1,2,1,2] [3,4,3,4] [1,2,1,2] [3,4,3,4]] i tried different things on numpy which didn't work any ideas ? thank you numpy.tile() might be what you are looking for. or which is my favorite tile and repeat replacement a= np.array([[1, 2], [3, 4]]) np.kron(np.ones((2,2)), a) array([[ 1., 2., 1., 2.], [ 3., 4., 3., 4.], [ 1., 2., 1., 2.], [ 3., 4., 3., 4.]]) np.kron(a, np.ones((2,2))) array([[ 1., 1., 2., 2.], [ 1., 1., 2., 2.], [ 3., 3., 4., 4.], [ 3., 3., 4., 4.]]) But, of'course this is way more generic (and preferable) approach to utilize. eat Josef Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Example Usage of Neighborhood Iterator in Cython
Hi, On Mon, Oct 17, 2011 at 9:19 PM, T J tjhn...@gmail.com wrote: I recently put together a Cython example which uses the neighborhood iterator. It was trickier than I thought it would be, so I thought to share it with the community. The function takes a 1-dimensional array and returns a 2-dimensional array of neighborhoods in the original area. This is somewhat similar to the functionality provided by segment_axis (http://projects.scipy.org/numpy/ticket/901), but I believe this slightly different in that neighborhood can extend to the left of the current index (assuming circular boundaries). Keep in mind that this is just an example, and normal usage probably is not concerned with creating a new array. External link: http://codepad.org/czRIzXQl -- import numpy as np cimport numpy as np cdef extern from numpy/arrayobject.h: ctypedef extern class numpy.flatiter [object PyArrayIterObject]: cdef int nd_m1 cdef np.npy_intp index, size cdef np.ndarray ao cdef char *dataptr # This isn't exposed to the Python API. # So we can't use the same approach we used to define flatiter ctypedef struct PyArrayNeighborhoodIterObject: int nd_m1 np.npy_intp index, size np.PyArrayObject *ao # note the change from np.ndarray char *dataptr object PyArray_NeighborhoodIterNew(flatiter it, np.npy_intp* bounds, int mode, np.ndarray fill_value) int PyArrayNeighborhoodIter_Next(PyArrayNeighborhoodIterObject *it) int PyArrayNeighborhoodIter_Reset(PyArrayNeighborhoodIterObject *it) object PyArray_IterNew(object arr) void PyArray_ITER_NEXT(flatiter it) np.npy_intp PyArray_SIZE(np.ndarray arr) cdef enum: NPY_NEIGHBORHOOD_ITER_ZERO_PADDING, NPY_NEIGHBORHOOD_ITER_ONE_PADDING, NPY_NEIGHBORHOOD_ITER_CONSTANT_PADDING, NPY_NEIGHBORHOOD_ITER_CIRCULAR_PADDING, NPY_NEIGHBORHOOD_ITER_MIRROR_PADDING np.import_array() def windowed(np.ndarray[np.int_t, ndim=1] arr, bounds): cdef flatiter iterx = flatiterPyArray_IterNew(objectarr) cdef np.npy_intp size = PyArray_SIZE(arr) cdef np.npy_intp* boundsPtr = [bounds[0], bounds[1]] cdef int hoodSize = bounds[1] - bounds[0] + 1 # Create the Python object and keep a reference to it cdef object niterx_ = PyArray_NeighborhoodIterNew(iterx, boundsPtr, NPY_NEIGHBORHOOD_ITER_CIRCULAR_PADDING, None) cdef PyArrayNeighborhoodIterObject *niterx = \ PyArrayNeighborhoodIterObject *niterx_ cdef int i,j cdef np.ndarray[np.int_t, ndim=2] hoods hoods = np.empty((arr.shape[0], hoodSize), dtype=np.int) for i in range(iterx.size): for j in range(niterx.size): hoods[i,j] = (niterx.dataptr)[0] PyArrayNeighborhoodIter_Next(niterx) PyArray_ITER_NEXT(iterx) PyArrayNeighborhoodIter_Reset(niterx) return hoods def test(): x = np.arange(10) print x print print windowed(x, [-1, 3]) print print windowed(x, [-2, 2]) -- If you run test(), this is what you should see: [0 1 2 3 4 5 6 7 8 9] [[9 0 1 2 3] [0 1 2 3 4] [1 2 3 4 5] [2 3 4 5 6] [3 4 5 6 7] [4 5 6 7 8] [5 6 7 8 9] [6 7 8 9 0] [7 8 9 0 1] [8 9 0 1 2]] [[8 9 0 1 2] [9 0 1 2 3] [0 1 2 3 4] [1 2 3 4 5] [2 3 4 5 6] [3 4 5 6 7] [4 5 6 7 8] [5 6 7 8 9] [6 7 8 9 0] [7 8 9 0 1]] windowed(x, [0, 2]) is almost like segment_axis(x, 3, 2, end='wrap'). Just wondering what are the main benefits, of your approach, comparing to simple: In []: a= arange(5) In []: n= 10 In []: b= arange(n)[:, None] In []: mod(a+ roll(b, 1), n) Out[]: array([[9, 0, 1, 2, 3], [0, 1, 2, 3, 4], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7], [4, 5, 6, 7, 8], [5, 6, 7, 8, 9], [6, 7, 8, 9, 0], [7, 8, 9, 0, 1], [8, 9, 0, 1, 2]]) In []: mod(a+ roll(b, 2), n) Out[]: array([[8, 9, 0, 1, 2], [9, 0, 1, 2, 3], [0, 1, 2, 3, 4], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7], [4, 5, 6, 7, 8], [5, 6, 7, 8, 9], [6, 7, 8, 9, 0], [7, 8, 9, 0, 1]]) Regards, eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fill a particular value in the place of number satisfying certain condition by another number in an array.
Hi On Mon, Aug 1, 2011 at 3:14 PM, Jeffrey Spencer jeffspenc...@gmail.comwrote: Depends where it is contained but another option is and I find it to typically be faster: B = zeros(A.shape) maximum(A,B,A) Since maximum(.) can handle broadcasting maximum(A, 0, A) will be even faster. -eat On 08/01/2011 07:31 PM, dileep kunjaai wrote: Dear sir, How can we fill a particular value in the place of number satisfying certain condition by another number in an array. Example: A=[[[ 9.42233087e-42 - 4.71116544e-42 0.e+00 ..., 1.48303127e+01 1.31524124e+01 1.14745111e+01] [ 3.91788793e+00 1.95894396e+00 0.e+00 ..., 1.78252487e+01 1.28667984e+01 7.90834856e+00] [ 7.83592510e+00 -3.91796255e+00 0.e+00 ..., 2.08202991e+01 1.25811749e+01 4.34205008e+00] ..., [ -8.51249974e-03 7.00901222e+00 -1.40095119e+01 ..., 0.e+00 0.e+00 0.e+00] [ 4.26390441e-03 3.51080871e+00 -7.01735353e+00 ..., 0.e+00 0.e+00 0.e+00] [ 0.e+00 0.e+00 0.e+00 ..., 0.e+00 0.e+00 0.e+00]] [[ 9.42233087e-42 -4.71116544e-42 0.e+00 ..., 8.48242474e+00 7.97146845e+00 7.46051216e+00] [ 5.16325808e+00 2.58162904e+00 0.e+00 ..., 8.47719383e+00 8.28024673e+00 8.08330059e+00] [ 1.03267126e+01 5.16335630e+00 0.e+00 ..., 8.47196198e+00 8.58903694e+00 8.70611191e+00] ..., [ 0.e+00 2.74500012e-01 5.4925e-01 ..., 0.e+00 0.e+00 0.e+00] [ 0.e+00 1.37496844e-01 -2.74993688e-01 ..., 0.e+00 0.e+00 0.e+00] [ 0.e+00 0.e+00 0.e+00 ..., 0.e+00 0.e+00 0.e+00]] [[ 9.42233087e-42 4.71116544e-42 0.e+00 ..., 1.18437748e+01 9.72778034e+00 7.61178637e+00] [ 2.96431869e-01 1.48215935e-01 0.e+00 ..., 1.64031239e+01 1.32768812e+01 1.01506386e+01] [ 5.92875004e-01 2.96437502e-01 0.e+00 ..., 2.09626484e+01 1.68261185e+01 1.26895866e+01] ..., [ 1.78188753e+00 -8.90943766e-01 0.e+00 ..., 0.e+00 1.2755e-03 2.5509e-03] [ 9.34620261e-01 -4.67310131e-01 0.e+00 ..., 0.e+00 6.38646539e-04 1.27729308e-03] [ 8.4339e-02 4.21500020e-02 0.e+00 ..., 0.e+00 0.e+00 0.e+00]]] A contain some negative value i want to change the negative numbers to '0'. I used 'masked_where', command but I failed. Please help me -- DILEEPKUMAR. R J R F, IIT DELHI ___ NumPy-Discussion mailing listNumPy-Discussion@scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Rationale for returning type-wrapped min() / max() scalars? (was: Problem with ufunc of a numpy.ndarray derived class)
Hi, On Sun, Jul 31, 2011 at 7:36 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Jul 31, 2011 at 12:50 AM, Hans Meine me...@informatik.uni-hamburg.de wrote: Am 29.07.2011 um 20:23 schrieb Nathaniel Smith: Even so, surely this behavior should be consistent between base class ndarrays and subclasses? If returning 0d arrays is a good idea, then we should do it everywhere. If it's a bad idea, then we shouldn't do it at all...? Very well put. That's exactly the reason why I am insisting on this discussion, and why I believe that the behavior change is not intentional. Otherwise, ndarray and matrix should behave like my subclass. (BTW: I did not check masked_array yet.) (In reality, it sounds like this might be some mishap in the __array_wrap__ mechanism?) That's exactly my guess. (That could also explain why Mark did not see anything obvious in the code.) Maybe. There isn't a problem for plain old zero dimensional arrays. In [1]: a = array(1) In [2]: a.dtype Out[2]: dtype('int64') In [3]: reshape(a, (1,1), order='f') Out[3]: array([[1]]) FWIW: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: a= array(1) In []: a.reshape((1, 1), order= 'F').flags Out[]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False In []: a.reshape((1, 1), order= 'C').flags Out[]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False Seems to be slightly inconsistent, but does it really matter? -eat This on Linux 64 with latest master. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Array vectorization in numpy
Hi, On Wed, Jul 20, 2011 at 2:42 AM, Chad Netzer chad.net...@gmail.com wrote: On Tue, Jul 19, 2011 at 6:10 PM, Pauli Virtanen p...@iki.fi wrote: k = m - 0.5 does here the same thing as k = np.empty_like(m) np.subtract(m, 0.5, out=k) The memory allocation (empty_like and the subsequent deallocation) costs essentially nothing, and there are no temporaries or copying in `subtract`. As verification: import timeit import numpy as np t=timeit.Timer('k = m - 0.5', setup='import numpy as np;m = np.ones([8092,8092],float)') np.mean(t.repeat(repeat=10, number=1)) 0.53904647827148433 t=timeit.Timer('k = np.empty_like(m);np.subtract(m, 0.5, out=k)', setup='import numpy as np;m = np.ones([8092,8092],float)') np.mean(t.repeat(repeat=10, number=1)) 0.54006035327911373 The trivial difference is expected as extra python parsing overhead, I think. Which leads me to apologize, since in my previous post I clearly meant to type m -= 0.5, not m =- 0.5, which is *quite* a different operation... Carlos, and Lutz, please take heed. :) In fact, as Lutz pointed out, that example was not at all what I intended to show anyway. So, just to demonstrate how it was wrong Perhaps slightly OT, but here is something very odd going on. I would expect the performance to be in totally different ballpark. t=timeit.Timer('m =- 0.5', setup='import numpy as np;m = np.ones([8092,8092],float)') np.mean(t.repeat(repeat=10, number=1)) 0.058299207687377931 More like: In []: %timeit m =- .5 1000 loops, best of 3: 35 ns per loop -eat t=timeit.Timer('m -= 0.5', setup='import numpy as np;m = np.ones([8092,8092],float)') np.mean(t.repeat(repeat=10, number=1)) 0.28192551136016847 t=timeit.Timer('np.subtract(m, 0.5, m)', setup='import numpy as np;m = np.ones([8092,8092],float)') np.mean(t.repeat(repeat=10, number=1)) 0.27014491558074949 t=timeit.Timer('np.subtract(m, 0.5, k)', setup='import numpy as np;m = np.ones([8092,8092],float); k = np.empty_like(m)') np.mean(t.repeat(repeat=10, number=1)) 0.54962997436523442 Perhaps the difference in the last two simply comes down to cache effects (having to iterate over two different large memory blocks, rather than one)? -Chad ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SVD does not converge
Hi, On Tue, Jun 28, 2011 at 7:43 PM, Lou Pecora lou_boog2...@yahoo.com wrote: -- *From:* santhu kumar mesan...@gmail.com *To:* numpy-discussion@scipy.org *Sent:* Tue, June 28, 2011 11:56:48 AM *Subject:* [Numpy-discussion] SVD does not converge Hello, I have a 380X5 matrix and when I am calculating pseudo-inverse of the matrix using pinv(numpy.linalg) I get the following error message: raise LinAlgError, 'SVD did not converge' numpy.linalg.linalg.LinAlgError: SVD did not converge I have looked in the list that it is a recurring issue but I was unable to find any solution. Can somebody please guide me on how to fix that issue? Thanks Santhosh == I had a similar problem (although I wasn't looking for the pseudo inverse). I found that squaring the matrix fixed the problem. But I'm guessing in your situation that would mean a 380 x 380 matrix (I hope I'm thinking about your case correctly). But it's worth trying since it's easy to do. With my rusty linear algebra: if one chooses to proceed with this 'squaring' avenue, wouldn't it then be more economical to base the calculations on a square 5x5 matrix? Something like: A_pinv= dot(A, pinv(dot(A.T, A))).T Instead of a 380x380 based matrix: A_pinv= dot(pinv(dot(A, A.T)), A).T My two cents - eat -- Lou Pecora, my views are my own. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] missing data discussion round 2
Hi, On Wed, Jun 29, 2011 at 1:40 AM, Jason Grout jason-s...@creativetrax.comwrote: On 6/28/11 5:20 PM, Matthew Brett wrote: Hi, On Tue, Jun 28, 2011 at 4:06 PM, Nathaniel Smithn...@pobox.com wrote: ... (You might think, what difference does it make if you *can* unmask an item? Us missing data folks could just ignore this feature. But: whatever we end up implementing is something that I will have to explain over and over to different people, most of them not particularly sophisticated programmers. And there's just no sensible way to explain this idea that if you store some particular value, then it replaces the old value, but if you store NA, then the old value is still there. Ouch - yes. No question, that is difficult to explain. Well, I think the explanation might go like this: Ah, yes, well, that's because in fact numpy records missing values by using a 'mask'. So when you say `a[3] = np.NA', what you mean is, 'a._mask = np.ones(a.shape, np.dtype(bool); a._mask[3] = False` Is that fair? Maybe instead of np.NA, we could say np.IGNORE, which sort of conveys the idea that the entry is still there, but we're just ignoring it. Of course, that goes against common convention, but it might be easier to explain. Somehow very similar approach how I always have treated the NaNs. (Thus postponing all the real (slightly dirty) work on to the imputation procedures). For me it has been sufficient to ignore what's the actual cause of NaNs. But I believe there exists plenty other much more sophisticated situations where this kind of simple treatment is not sufficient, at all. Anyway, even in the future it should still be possible to play nicely with these kind of simple scenarios. - eat Thanks, Jason ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] missing data discussion round 2
Hi, On Mon, Jun 27, 2011 at 6:55 PM, Mark Wiebe mwwi...@gmail.com wrote: First I'd like to thank everyone for all the feedback you're providing, clearly this is an important topic to many people, and the discussion has helped clarify the ideas for me. I've renamed and updated the NEP, then placed it into the master NumPy repository so it has a more permanent home here: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst In the NEP, I've tried to address everything that was raised in the original thread and in Nathaniel's followup 'Concepts' thread. To deal with the issue of whether a mask is True or False for a missing value, I've removed the 'mask' attribute entirely, except for ufunc-like functions np.ismissing and np.isavail which return the two styles of masks. Here's a high level summary of how I'm thinking of the topic, and what I will implement: *Missing Data Abstraction* There appear to be two useful ways to think about missing data that are worth supporting. 1) Unknown yet existing data 2) Data that doesn't exist In 1), an NA value causes outputs to become NA except in a small number of exceptions such as boolean logic, and in 2), operations treat the data as if there were a smaller array without the NA values. *Temporarily Ignoring Data* * * In some cases, it is useful to flag data as NA temporarily, possibly in several different ways, for particular calculations or testing out different ways of throwing away outliers. This is independent of the missing data abstraction, still requiring a choice of 1) or 2) above. *Implementation Techniques* * * There are two mechanisms generally used to implement missing data abstractions, * * 1) An NA bit pattern 2) A mask I've described a design in the NEP which can include both techniques using the same interface. The mask approach is strictly more general than the NA bit pattern approach, except for a few things like the idea of supporting the dtype 'NA[f8,InfNan]' which you can read about in the NEP. My intention is to implement the mask-based design, and possibly also implement the NA bit pattern design, but if anything gets cut it will be the NA bit patterns. Thanks again for all your input so far, and thanks in advance for your suggestions for improving this new revision of the NEP. A very impressive PEP indeed. However, how would corner cases, like a = np.array([np.NA, np.NA], dtype='f8', masked=True) np.mean(a, skipna=True) np.mean(a) be handled? My concern here is that there always seems to be such corner cases which can only be handled with specific context knowledge. Thus producing 100% generic code to handle 'missing data' is not doable. Thanks, - eat -Mark ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] missing data discussion round 2
On Mon, Jun 27, 2011 at 8:53 PM, Mark Wiebe mwwi...@gmail.com wrote: On Mon, Jun 27, 2011 at 12:44 PM, eat e.antero.ta...@gmail.com wrote: Hi, On Mon, Jun 27, 2011 at 6:55 PM, Mark Wiebe mwwi...@gmail.com wrote: First I'd like to thank everyone for all the feedback you're providing, clearly this is an important topic to many people, and the discussion has helped clarify the ideas for me. I've renamed and updated the NEP, then placed it into the master NumPy repository so it has a more permanent home here: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst In the NEP, I've tried to address everything that was raised in the original thread and in Nathaniel's followup 'Concepts' thread. To deal with the issue of whether a mask is True or False for a missing value, I've removed the 'mask' attribute entirely, except for ufunc-like functions np.ismissing and np.isavail which return the two styles of masks. Here's a high level summary of how I'm thinking of the topic, and what I will implement: *Missing Data Abstraction* There appear to be two useful ways to think about missing data that are worth supporting. 1) Unknown yet existing data 2) Data that doesn't exist In 1), an NA value causes outputs to become NA except in a small number of exceptions such as boolean logic, and in 2), operations treat the data as if there were a smaller array without the NA values. *Temporarily Ignoring Data* * * In some cases, it is useful to flag data as NA temporarily, possibly in several different ways, for particular calculations or testing out different ways of throwing away outliers. This is independent of the missing data abstraction, still requiring a choice of 1) or 2) above. *Implementation Techniques* * * There are two mechanisms generally used to implement missing data abstractions, * * 1) An NA bit pattern 2) A mask I've described a design in the NEP which can include both techniques using the same interface. The mask approach is strictly more general than the NA bit pattern approach, except for a few things like the idea of supporting the dtype 'NA[f8,InfNan]' which you can read about in the NEP. My intention is to implement the mask-based design, and possibly also implement the NA bit pattern design, but if anything gets cut it will be the NA bit patterns. Thanks again for all your input so far, and thanks in advance for your suggestions for improving this new revision of the NEP. A very impressive PEP indeed. Hi, However, how would corner cases, like a = np.array([np.NA, np.NA], dtype='f8', masked=True) np.mean(a, skipna=True) This should be equivalent to removing all the NA values, then calling mean, like this: b = np.array([], dtype='f8') np.mean(b) /home/mwiebe/virtualenvs/dev/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2374: RuntimeWarning: invalid value encountered in double_scalars return mean(axis, dtype, out) nan np.mean(a) This would return NA, since NA values are sitting in positions that would affect the output result. OK. be handled? My concern here is that there always seems to be such corner cases which can only be handled with specific context knowledge. Thus producing 100% generic code to handle 'missing data' is not doable. Working out the corner cases for the functions that are already in numpy seems tractable to me, how to or whether to support missing data is something the author of each new function will have to consider when missing data support is in NumPy, but I don't think we can do more than provide the mechanisms for people to use. Sure. I'll ride up with this and wait when I'll have some tangible to outperform the 'traditional' NaN handling. - eat -Mark Thanks, - eat -Mark ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argmax for top N elements
Hi, On Wed, Jun 22, 2011 at 6:30 PM, Alex Flint alex.fl...@gmail.com wrote: Is it possible to use argmax or something similar to find the locations of the largest N elements in a matrix? How bout argsort(.)?, Like: In []: a= arange(9) In []: a.argsort()[::-1][:3] Out[]: array([8, 7, 6]) My 2 cents, eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 3d ODR
Hi, On Thu, Jun 16, 2011 at 2:28 PM, Christian K. ckk...@hoc.net wrote: Hi, I need to do fit a 3d surface to a point cloud. This sounds like a job for 3d orthogonal distance regression. Does anybody know of an implementation? How bout http://docs.scipy.org/doc/scipy/reference/odr.htmhttp://docs.scipy.org/doc/scipy/reference/odr.html My 2 cents, eat Regards, Christian K. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] finding elements that match any in a set
On Sun, May 29, 2011 at 10:58 PM, Chris Barker chris.bar...@noaa.govwrote: On 5/28/2011 3:40 PM, Robert wrote: (myarray in mylist) turns into mylist.__contains__(myarray). Only the list object is ever checked for this method. There is no paired method myarray.__rcontains__(mylist) so there is nothing that numpy can override to make this operation do anything different from what lists normally do, however, numpy arrays should be able to override in be defining their own.__contains__ method, so you could do: something in an_array and get a useful, vectorized result. So I thought I'd see what currently happens when I try that: In [24]: a Out[24]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) In [25]: 3 in a Out[25]: True So the simple case works just like a list. But what If I want what the OP wants: In [26]: b Out[26]: array([3, 6, 4]) In [27]: b in a Out[27]: False OK, so the full b array is not in a, and it doesn't vectorize it, either. But: In [29]: a Out[29]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]]) In [30]: b in a Out[30]: True HUH? I'm not sure by what definition we would say that b is contained in a. but maybe.. In [41]: b Out[41]: array([ 4, 2, 345]) In [42]: b in a Out[42]: False so it's are all of the elements in b in a somewhere? but only for 2-d arrays? So what does it mean? FWIW, a short prelude on the theme seems quite promising, indeed: In []: A Out[]: array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) In []: [0, 1, 2] in A Out[]: True In []: [0, 3, 6] in A Out[]: True In []: [0, 4, 8] in A Out[]: True In []: [8, 4, 0] in A Out[]: True In []: [2, 4, 6] in A Out[]: True In []: [6, 4, 2] in A Out[]: True In []: [3, 1, 5] in A Out[]: True In [1061]: [3, 1, 4] in A Out[1061]: True But In []: [1, 2, 3] in A Out[]: False In []: [3, 2, 1] in A Out[]: True So, obviously the logic behind __contains__ is not so very straightforward. Perhaps just a bug? Regards, eat The docstring is not helpful: In [58]: np.ndarray.__contains__? Type: wrapper_descriptor Base Class: type 'wrapper_descriptor' String Form:slot wrapper '__contains__' of 'numpy.ndarray' objects Namespace: Interactive Docstring: x.__contains__(y) == y in x If nothing useful, maybe it could provide a vectorized version of in for this sort of use case. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Need to eliminate a nested loop
Hi, On Wed, May 11, 2011 at 8:28 AM, Elfnor elf...@gmail.com wrote: Hi The following code produces the desired result but has a slow triple loop iterating over the matrix multiplication. I'm sure it can be eliminated with a neat indexing trick but I can't figure out how. Any suggestions please? - import numpy #define domain of function x = numpy.linspace(-5,5,64) y = numpy.linspace(-5,5,64) z = numpy.linspace(-5,5,64) #calculate f at each point in domain a = 5.0 b = 3.0 c = 2.0 #ellipsoid E = numpy.array([[1/a**2, 0 , 0 , 0 ], [ 0 ,1/b**2 , 0 , 0 ], [ 0 , 0 ,1/c**2, 0 ], [ 0 , 0 , 0 , -1 ]]) f = numpy.zeros((x.size,y.size,z.size)) for i,xi in enumerate(x): for j,yj in enumerate(y): for k,zk in enumerate(z): X = numpy.array([xi,yj,zk,1]) f[i,j,k] = numpy.dot(numpy.dot(X.T,E),X) --- Something like this: n= 64 u= np.linspace(-5, 5, n)[None, :] u0= u.repeat(n** 2)[None, :] u1= u.repeat(n)[None, :].repeat(n, axis= 0).reshape(1, -1) u2= u.repeat(n** 2, axis= 0).reshape(1, -1) U= np.r_[u0, u1, u2, np.ones((1, n** 3))] f= (np.dot(E, U)* U).sum(0).reshape(n, n, n) Regards,eat Thanks Eleanor -- View this message in context: http://old.nabble.com/Need-to-eliminate-a-nested-loop-tp31591457p31591457.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.histogram on arrays.
Hi, On Wed, Mar 30, 2011 at 1:44 PM, Éric Depagne e...@depagne.org wrote: Well I guess, for a slight performance improvement, you could create your own streamlined histogrammer. But, in order to better grasp your situation it would be beneficial to know how the counts and bounds are used later on. Just wondering if this kind massive histogramming could be somehow avoided totally. Indeed. Here's what I do. My images come from CCD, and as such, the zero level in the image is not the true zero level, but is the true zero + the background noise of each pixels. By doing the histogram, I plan on detecting what is the most common value per row. Once I have the most common value, I can derive the interval where most of the values are (the index of the largest occurence is easily obtained by sorting the counts, and I take a slice [index_max_count,index_max_count+1] in the second array given by the histogram). Then, I take the mean value of this interval and I assume it is the value of the bias for my row. I do this procedure both on the row and columns as a sanity check. And I know this procedure will not work if on any row/column there is a lot of signal and very little bias. I'll fix that afterwards ;-) Perhaps something along these lines will help you: from numpy import histogram, linspace, r_ def hist2(a, nob): bins= linspace(a.min(), a.max(), nob+ 1) d= linspace(0, bins[-1]* a.shape[0], a.shape[0])[:, None] b= (a+ d).ravel() bbins= (bins[:-1]+ d).ravel() bbins= r_[bbins, bbins[-1]+ 1] counts, _= histogram(b, bbins) return counts.reshape(-1, nob), bins It has two disadvantages 1) needs more memory and 2) global bins (which although should be quite straightforward to enhance if needed). Regards, eat Éric. Regards, eat Un clavier azerty en vaut deux -- Éric Depagnee...@depagne.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.histogram on arrays.
Hi, On Wed, Mar 30, 2011 at 10:04 AM, Éric Depagne e...@depagne.org wrote: Hi. Sorry for not having been clearer. I'll explain a little bit. I have 4k x 4k images that I want to analyse. I turn them into numpy arrays so I have 4k x 4k np.array. My analysis starts with determining the bias level. To do that, I compute for each line, and then for each row, an histogram. So I compute 8000 histograms. Here is the code I've used sofar: for i in range(self.data.shape[0]): #Compute an histogram along the columns # Gets counts and bounds self.countsC[i], self.boundsC[i] = np.histogram(data[i], bins=self.bins) for i in range(self.data.shape[1]): # Do the same, along the rows. self.countsR[i], self.boundsR[i] = np.histogram(data[:,i], bins=self.bins) And data.shape is (4000,4000). If histogram had an axis parameter, I could avoid the loop and I guess it would be faster. Well I guess, for a slight performance improvement, you could create your own streamlined histogrammer. But, in order to better grasp your situation it would be beneficial to know how the counts and bounds are used later on. Just wondering if this kind massive histogramming could be somehow avoided totally. Regards, eat Éric. So it seems that you give your array directly to histogramdd (asking a 4000D histogram!). Surely that's not what you are trying to achieve. Can you elaborate more on your objectives? Perhaps some code (slow but working) to demonstrate the point. Regards, eat Un clavier azerty en vaut deux -- Éric Depagnee...@depagne.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] random number genration
Hi, On Tue, Mar 29, 2011 at 12:00 PM, Alex Ter-Sarkissov ater1...@gmail.comwrote: If I want to generate a string of random bits with equal probability I run random.randint(0,2,size). What if I want a specific proportion of bits? In other words, each bit is 1 with probability p1/2 and 0 with probability q=1-p? Would it be sufficient to: In []: bs= ones(1e6, dtype= int) In []: bs[randint(0, 1e6, 1e5)]= 0 In []: bs.sum()/ 1e6 Out[]: 0.904706 Regards, eat thanks ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] random number genration
On Tue, Mar 29, 2011 at 1:29 PM, eat e.antero.ta...@gmail.com wrote: Hi, On Tue, Mar 29, 2011 at 12:00 PM, Alex Ter-Sarkissov ater1...@gmail.comwrote: If I want to generate a string of random bits with equal probability I run random.randint(0,2,size). What if I want a specific proportion of bits? In other words, each bit is 1 with probability p1/2 and 0 with probability q=1-p? Would it be sufficient to: In []: bs= ones(1e6, dtype= int) In []: bs[randint(0, 1e6, 1e5)]= 0 In []: bs.sum()/ 1e6 Out[]: 0.904706 Or: In []: bs= ones(1e6, dtype= int) In []: bs[rand(1e6) 1./ 10]= 0 In []: bs.sum()/ 1e6 Out[]: 0.89983 Regards, eat thanks ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.histogram on arrays.
Hi, On Tue, Mar 29, 2011 at 4:29 PM, Éric Depagne e...@depagne.org wrote: Hi all. Sorry if this question has already been asked. I've searched the archive, but could not find anything related, so here is my question. I'm using np.histogram on a 4000x4000 array, each with 200 bins. I do that on both dimensions, meaning I compute 8000 histograms. It takes around 5 seconds (which is of course quite fast). I was wondering why np.histogram does not accept an axis parameter so that it could work directly on the array without me having to write a loop. Or maybe did I miss some parameters using np.histogram. FWIW, have you considered to use http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogramdd.html#numpy.histogramdd Regards, eat Thanks. Éric. Un clavier azerty en vaut deux -- Éric Depagnee...@depagne.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] np.histogram on arrays.
Hi, On Tue, Mar 29, 2011 at 5:13 PM, Éric Depagne e...@depagne.org wrote: FWIW, have you considered to use http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogramdd.html# numpy.histogramdd Regards, eat I tried, but I get a /usr/lib/pymodules/python2.6/numpy/lib/function_base.pyc in histogramdd(sample, bins, range, normed, weights) 370 # Reshape is used so that overlarge arrays 371 # will raise an error. -- 372 hist = zeros(nbin, float).reshape(-1) 373 374 # Compute the sample indices in the flattened histogram matrix. ValueError: sequence too large; must be smaller than 32 so I suspect my array is too big for histogramdd So it seems that you give your array directly to histogramdd (asking a 4000D histogram!). Surely that's not what you are trying to achieve. Can you elaborate more on your objectives? Perhaps some code (slow but working) to demonstrate the point. Regards, eat Éric. -- Un clavier azerty en vaut deux -- Éric Depagnee...@depagne.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] should get rid of the annoying numpy STDERR output
Hi 2011/3/24 Dmitrey tm...@ukr.net from numpy import inf, array inf*0 nan (ok) array(inf) * 0.0 StdErr: Warning: invalid value encountered in multiply nan My cycled calculations yields this thousands times slowing computations and making text output completely non-readable. Would old= seterr(invalid= 'ignore') be sufficient for you? Regards, eat from numpy import __version__ __version__ '2.0.0.dev-1fe8136' D. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] should get rid of the annoying numpy STDERR output
Hi On Thu, Mar 24, 2011 at 4:17 PM, Skipper Seabold jsseab...@gmail.comwrote: On Thu, Mar 24, 2011 at 9:45 AM, Ralf Gommers ralf.gomm...@googlemail.com wrote: 2011/3/24 Dmitrey tm...@ukr.net: Hi 2011/3/24 Dmitrey tm...@ukr.net from numpy import inf, array inf*0 nan (ok) array(inf) * 0.0 StdErr: Warning: invalid value encountered in multiply nan My cycled calculations yields this thousands times slowing computations and making text output completely non-readable. Would old= seterr(invalid= 'ignore') be sufficient for you? yes for me, but I'm not sure for all those users who use my soft. Maybe it will hide some bugs in their objective functions and nonlinear constraints in numerical optimization and nonlinear equation systems. If we do what you request in your message subject your users will have the same issue. You can wrap (parts of) your code in something like: olderr = seterr(invalid= 'ignore') do stuff seterr(**olderr) Also, as Robert pointed out to me before np.errstate is a context-manager for ignoring these warnings. I often wrap optimization code with it. I didn't realize that its context-manager, but that's really create! (Perhaps documents could reflect that.) Regards, eat In [3]: np.array(np.inf)*0. Warning: invalid value encountered in multiply Out[3]: nan In [4]: with np.errstate(all='ignore'): ...: np.array(np.inf)*0. ...: ...: Out[4]: nan In [5]: np.array(np.inf)*0. Warning: invalid value encountered in multiply Out[5]: nan Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] avoid a line...
Hi, On Thu, Mar 17, 2011 at 9:36 AM, dileep kunjaai dileepkunj...@gmail.comwrote: Dear sir, I am try to read a file of the following format, I want to avoid the first line and read the remaining as 'float' . Please help me... RainWindTempPrSal 0.11.10.020.2 0.2 0.50. 0. 0.4 0.8 0.55.51.50.5 1.5 3.50.51.55.0 2.6 5.14.13.22.3 1.5 4.40.91.52.2.3 You may use loadtxt(), with skiprows argument. (See more on http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html). Regards, eat -- DILEEPKUMAR. R J R F, IIT DELHI ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Norm of array of vectors
Hi, On Thu, Mar 17, 2011 at 10:44 AM, Andrey N. Sobolev inco...@list.ru wrote: Dear all, Sorry if that's a noob question, but anyway. I have several thousands of vectors stacked in 2d array. I'd like to get new array containing Euclidean norms of these vectors and get the vector with minimal norm. Is there more efficient way to do this than argmin(array([sqrt(dot(x,x)) for x in vec_array]))? Try argmin(sum(vec_array** 2, 0)** 0.5) Regards, eat Thanks in advance. Andrey. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] view 1d array as overlapping segments?
Hi, On Mon, Mar 7, 2011 at 5:01 PM, Neal Becker ndbeck...@gmail.com wrote: reshape can view a 1d array as non-overlapping segments. Is there a convenient way to view a 1d array as a 2d array of overlapping segments? nonoverlapping: l: segment length k: overlap u is the 1d array v is a 2d array v[i] = u[l*i:(l+1)*i] overlapping: v[i] = u[l*i:(l+1)*i+k] In case actually v[i]= u[i* l: (i+ 1)* l+ k], then this may be useful from numpy.lib.stride_tricks import as_strided as ast def os(a, l, k= 0): shape, strides= (a.shape[0]- l+ 1, l+ k), a.strides* 2 return ast(a, shape= shape, strides= strides) if __name__ == '__main__': import numpy as np a= np.arange(7, dtype= np.int8) print os(a, 3) # [[0 1 2] # [1 2 3] # [2 3 4] # [3 4 5] # [4 5 6]] print os(a, 3, 2) # [[ 0 1 2 3 4] # [ 1 2 3 4 5] # [ 2 3 4 5 6] # [ 3 4 5 6 0] # last item garbage # [ 4 5 6 0 34]] # 2 last items garbage My two cents, eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Taking a large number of dot products at once
Hi, On Fri, Mar 4, 2011 at 8:19 AM, Daniel Hyams dhy...@gmail.com wrote: This is probably so easy, I'm embarrassed to ask it...but I've been casting around trying things to no avail for the last hour and a half, so here goes... I have a lot of dot products to take. The length-3 vectors that I want to dot are stacked in a 2D array like this: U = [u1 u2 u3] and V = [v1 v2 v3] So both of these arrays, are, say, 3x100 each. I just want to take the dot product of each of the corresponding vectors, so that the result is [u1.v1 u2.v2 u3.v3 ] which would be a 1x100 array in this case. Which function do I need to use? I thought tensordot() was the one, but I couldn't make it workpure user error I'm sure. No function needed for this case, just: In []: x= rand(3, 7) In []: y= rand(3, 7) In []: d= (x* y).sum(0) In [490]: d Out[490]: array([ 1.25404683, 0.19113117, 1.37267133, 0.74219888, 1.55296562, 0.15264303, 0.72039922]) In [493]: dot(x[:, 0].T, y[:, 0]) Out[493]: 1.2540468282421895 Regards, eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?
Hi, On Tue, Feb 15, 2011 at 5:50 PM, Sebastian Haase seb.ha...@gmail.comwrote: Hi, I assume that someone here could maybe help me, and I'm hoping it's not too much off topic. I have 2 arrays of 2d point coordinates and would like to calculate all pairwise distances as fast as possible. Going from Python/Numpy to a (Swigged) C extension already gave me a 55x speedup. (.9ms vs. 50ms for arrays of length 329 and 340). With my very modest machine (Intel Dual CPU E2200, 2.2 Ghz) utilizing scipy.spatial.distance.pdist will take some 1.5 ms for such arrays. Even the simple linear algebra based full matrix calculations can be done less than 5 ms. My two cents, eat I'm using gcc on Linux. Now I'm wondering if I could go even faster !? My hope that the compiler might automagically do some SSE2 optimization got disappointed. Since I have a 4 core CPU I thought OpenMP might be an option; I never used that, and after some playing around I managed to get (only) 50% slowdown(!) :-( My code in short is this: (My SWIG typemaps use obj_to_array_no_conversion() from numpy.i) ---Ccode -- void dists2d( double *a_ps, int nx1, int na, double *b_ps, int nx2, int nb, double *dist, int nx3, int ny3) throw (char*) { if(nx1 != 2) throw (char*) a must be of shape (n,2); if(nx2 != 2) throw (char*) b must be of shape (n,2); if(nx3 != nb || ny3 != na)throw (char*) c must be of shape (na,nb); double *dist_; int i, j; #pragma omp parallel private(dist_, j, i) { #pragma omp for nowait for(i=0;ina;i++) { //num_threads=omp_get_num_threads(); -- 4 dist_ = dist+i*nb; // dists_ is only introduced for OpenMP for(j=0;jnb;j++) { *dist_++ = sqrt( sq(a_ps[i*nx1] - b_ps[j*nx2]) + sq(a_ps[i*nx1+1] - b_ps[j*nx2+1]) ); } } } } ---/Ccode -- There is probably a simple mistake in this code - as I said I never used OpenMP before. It should be not too difficult to use OpenMP correctly here or - maybe better - is there a simple SSE(2,3,4) version that might be even better than OpenMP... !? I supposed, that I did not get the #pragma omp lines right - any idea ? Or is it in general not possible to speed this kind of code up using OpenMP !? Thanks, Sebastian Haase ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] odd performance of sum?
Hi Sturla, On Sat, Feb 12, 2011 at 5:38 PM, Sturla Molden stu...@molden.no wrote: Den 10.02.2011 16:29, skrev eat: One would expect sum to outperform dot with a clear marginal. Does there exixts any 'tricks' to increase the performance of sum? First of all, thanks for you still replying. Well, I'm still a little bit unsure how I should proceed with this discussion... I may have used bad wordings and created unneccessary mayhyem with my original question (:. Trust me, I'm only trying to discuss here with a positive criticism in my mind. Now, I'm not pretending to know what kind of a person a 'typical' numpy user is. But I'm assuming that there just exists more than me with roughly similar questions in their (our) minds and who wish to utilize numpy more 'pythonic; all batteries included' way. Ocassionally I (we) may ask really stupid questions, but please beare with us. Said that, I'm still very confident that (from a users point of view) there's some real substance on the issue I addressed. I see that others have ansvered already. The ufunc np.sum is not going going to beat np.dot. You are racing the heavy machinery of NumPy (array iterators, type chekcs, bound checks, etc.) against level-3 BLAS routine DGEMM, the most heavily optimized numerical kernel ever written. Fair enough. Also beware that computation is much cheaper than memory access. Sure, that's exactly where I expected the performance boost to emerge. Although DGEMM does more arithmetics, and even is O(N3) in that respect, it is always faster except for very sparse arrays. If you need fast loops, you can always write your own Fortran or C, and even insert OpenMP pragmas. That's a very important potential, but surely not all numpy users are expected to master that ;-) But don't expect that to beat optimized high-level BLAS kernels by any margin. The first chapters of Numerical Methods in Fortran 90 might be worth reading. It deals with several of these issues, including dimensional expansion, which is important for writing fast numerical code -- but not intuitively obvious. I expect this to be faster because it does less work is a fundamental misconception in numerical computing. Whatever cause less traffic on the memory BUS (the real bottleneck) will almost always be faster, regardless of the amount of work done by the CPU. And I'm totally aware of it, and actually it was exactly the original intended logic of my question: how bout if the sum could follow the steps of dot; then, since less instructions it must be bounded below of the execution time of dot. But as R. Kern gently pointed out allready it's not fruitfull enough avenue to proceed. And I'm able to live with that. Regards, eat A good advice is to use high-level BLAS whenever you can. The only exception, as mentioned, is when matrices get very sparse. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] odd performance of sum?
Hi, Observing following performance: In []: m= 1e5 In []: n= 1e2 In []: o= ones(n) In []: M= randn(m, n) In []: timeit M.sum(1) 10 loops, best of 3: 38.3 ms per loop In []: timeit dot(M, o) 10 loops, best of 3: 21.1 ms per loop In []: m= 1e2 In []: n= 1e5 In []: o= ones(n) In []: M= randn(m, n) In []: timeit M.sum(1) 100 loops, best of 3: 18.3 ms per loop In []: timeit dot(M, o) 10 loops, best of 3: 21.2 ms per loop One would expect sum to outperform dot with a clear marginal. Does there exixts any 'tricks' to increase the performance of sum? In []: sys.version Out[]: '2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]' # installed binaries from http://python.org/ In []: np.version.version Out[]: '1.5.1' # installed binaries from http://scipy.org/ Regards, eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] odd performance of sum?
Thanks Chuck, for replying. But don't you still feel very odd that dot outperforms sum in your machine? Just to get it simply; why sum can't outperform dot? Whatever architecture (computer, cache) you have, it don't make any sense at all that when performing significantly less instructions, you'll reach to spend more time ;-). Regards, eat On Thu, Feb 10, 2011 at 7:10 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Feb 10, 2011 at 8:29 AM, eat e.antero.ta...@gmail.com wrote: Hi, Observing following performance: In []: m= 1e5 In []: n= 1e2 In []: o= ones(n) In []: M= randn(m, n) In []: timeit M.sum(1) 10 loops, best of 3: 38.3 ms per loop In []: timeit dot(M, o) 10 loops, best of 3: 21.1 ms per loop In []: m= 1e2 In []: n= 1e5 In []: o= ones(n) In []: M= randn(m, n) In []: timeit M.sum(1) 100 loops, best of 3: 18.3 ms per loop In []: timeit dot(M, o) 10 loops, best of 3: 21.2 ms per loop One would expect sum to outperform dot with a clear marginal. Does there exixts any 'tricks' to increase the performance of sum? I'm not surprised, much depends on the version of ATLAS or MKL you are linked to. If you aren't linked to either and just using numpy's version then the results are a bit strange. With numpy development I get In [1]: m= 1e5 In [2]: n= 1e2 In [3]: o= ones(n) In [4]: M= randn(m, n) In [5]: timeit M.sum(1) 100 loops, best of 3: 19.2 ms per loop In [6]: timeit dot(M, o) 100 loops, best of 3: 15 ms per loop In [7]: m= 1e2 In [8]: n= 1e5 In [9]: o= ones(n) In [10]: M= randn(m, n) In [11]: timeit M.sum(1) 100 loops, best of 3: 17.4 ms per loop In [12]: timeit dot(M, o) 100 loops, best of 3: 14.2 ms per loop Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] odd performance of sum?
Hi Robert, On Thu, Feb 10, 2011 at 8:16 PM, Robert Kern robert.k...@gmail.com wrote: On Thu, Feb 10, 2011 at 11:53, eat e.antero.ta...@gmail.com wrote: Thanks Chuck, for replying. But don't you still feel very odd that dot outperforms sum in your machine? Just to get it simply; why sum can't outperform dot? Whatever architecture (computer, cache) you have, it don't make any sense at all that when performing significantly less instructions, you'll reach to spend more time ;-). These days, the determining factor is less often instruction count than memory latency, and the optimized BLAS implementations of dot() heavily optimize the memory access patterns. Can't we have this as well with simple sum? Additionally, the number of instructions in your dot() probably isn't that many more than the sum(). The sum() is pretty dumb But does it need to be? and just does a linear accumulation using the ufunc reduce mechanism, so (m*n-1) ADDs plus quite a few instructions for traversing the array in a generic manner. With fused multiply-adds, being able to assume contiguous data and ignore the numpy iterator overhead, and applying divide-and-conquer kernels to arrange sums, the optimized dot() implementations could have a comparable instruction count. Couldn't sum benefit with similar logic? If you were willing to spend that amount of developer time and code complexity to make platform-specific backends to sum() Actually I would, but I'm not competent at all in that detailed level (:, But I'm willing to spend more on my own time for example for testing, debugging, analysing various improvements and suggestions if such emerge. , you could make it go really fast, too. Typically, it's not all that important to make it worthwhile, though. One thing that might be worthwhile is to make implementations of sum() and cumsum() that avoid the ufunc machinery and do their iterations more quickly, at least for some common combinations of dtype and contiguity. Well I'm allready perplexd before reaching that 'ufunc machinery', it's actually anyway trivial (for us more mortal ;-) to figure out what's happening with sum on fromnumeric.py! Regards, eat -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] odd performance of sum?
Hi Pauli, On Thu, Feb 10, 2011 at 8:31 PM, Pauli Virtanen p...@iki.fi wrote: Thu, 10 Feb 2011 12:16:12 -0600, Robert Kern wrote: [clip] One thing that might be worthwhile is to make implementations of sum() and cumsum() that avoid the ufunc machinery and do their iterations more quickly, at least for some common combinations of dtype and contiguity. I wonder what is the balance between the iterator overhead and the time taken in the reduction inner loop. This should be straightforward to benchmark. Apparently, some overhead decreased with the new iterators, since current Numpy master outperforms 1.5.1 by a factor of 2 for this benchmark: In [8]: %timeit M.sum(1) # Numpy 1.5.1 10 loops, best of 3: 85 ms per loop In [8]: %timeit M.sum(1) # Numpy master 10 loops, best of 3: 49.5 ms per loop I don't think this is explainable by the new memory layout optimizations, since M is C-contiguous. Perhaps there would be room for more optimization, even within the ufunc framework? I hope so. Please suggest if there's anything that I can do to further advance this. (My C skills are allready bit rusty, but at any higher level I'll try my best to contribute). Thanks, eat -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] odd performance of sum?
Hi Robert, On Thu, Feb 10, 2011 at 10:58 PM, Robert Kern robert.k...@gmail.com wrote: On Thu, Feb 10, 2011 at 14:29, eat e.antero.ta...@gmail.com wrote: Hi Robert, On Thu, Feb 10, 2011 at 8:16 PM, Robert Kern robert.k...@gmail.com wrote: On Thu, Feb 10, 2011 at 11:53, eat e.antero.ta...@gmail.com wrote: Thanks Chuck, for replying. But don't you still feel very odd that dot outperforms sum in your machine? Just to get it simply; why sum can't outperform dot? Whatever architecture (computer, cache) you have, it don't make any sense at all that when performing significantly less instructions, you'll reach to spend more time ;-). These days, the determining factor is less often instruction count than memory latency, and the optimized BLAS implementations of dot() heavily optimize the memory access patterns. Can't we have this as well with simple sum? It's technically feasible to accomplish, but as I mention later, it entails quite a large cost. Those optimized BLASes represent many man-years of effort Yes I acknowledge this. But didn't they then ignore them something simpler, like sum (but which actually could benefit exactly similiar optimizations). and cause substantial headaches for people building and installing numpy. I appreciate this. No doubt at all. However, they are frequently worth it because those operations are often bottlenecks in whole applications. sum(), even in its stupidest implementation, rarely is. In the places where it is a significant bottleneck, an ad hoc implementation in C or Cython or even FORTRAN for just that application is pretty easy to write. But here I have to disagree; I'll think that at least I (if not even the majority of numpy users) don't like (nor I'm be capable/ or have enough time/ resources) go to dwell such details. I'm sorry but I'll have to restate that it's quite reasonable to expect that sum outperforms dot in any case. Lets now to start make such movements, which enables sum to outperform dot. You can gain speed by specializing to just your use case, e.g. contiguous data, summing down to one number, or summing along one axis of only 2D data, etc. There's usually no reason to try to generalize that implementation to put it back into numpy. Yes, I would really like to specialize into my case, but 'without going out the python realm.' Thanks, eat Additionally, the number of instructions in your dot() probably isn't that many more than the sum(). The sum() is pretty dumb But does it need to be? As I also allude to later in my email, no, but there are still costs involved. and just does a linear accumulation using the ufunc reduce mechanism, so (m*n-1) ADDs plus quite a few instructions for traversing the array in a generic manner. With fused multiply-adds, being able to assume contiguous data and ignore the numpy iterator overhead, and applying divide-and-conquer kernels to arrange sums, the optimized dot() implementations could have a comparable instruction count. Couldn't sum benefit with similar logic? Etc. I'm not going to keep repeating myself. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] odd performance of sum?
Hi, On Fri, Feb 11, 2011 at 12:08 AM, Robert Kern robert.k...@gmail.com wrote: On Thu, Feb 10, 2011 at 15:32, eat e.antero.ta...@gmail.com wrote: Hi Robert, On Thu, Feb 10, 2011 at 10:58 PM, Robert Kern robert.k...@gmail.com wrote: On Thu, Feb 10, 2011 at 14:29, eat e.antero.ta...@gmail.com wrote: Hi Robert, On Thu, Feb 10, 2011 at 8:16 PM, Robert Kern robert.k...@gmail.com wrote: On Thu, Feb 10, 2011 at 11:53, eat e.antero.ta...@gmail.com wrote: Thanks Chuck, for replying. But don't you still feel very odd that dot outperforms sum in your machine? Just to get it simply; why sum can't outperform dot? Whatever architecture (computer, cache) you have, it don't make any sense at all that when performing significantly less instructions, you'll reach to spend more time ;-). These days, the determining factor is less often instruction count than memory latency, and the optimized BLAS implementations of dot() heavily optimize the memory access patterns. Can't we have this as well with simple sum? It's technically feasible to accomplish, but as I mention later, it entails quite a large cost. Those optimized BLASes represent many man-years of effort Yes I acknowledge this. But didn't they then ignore them something simpler, like sum (but which actually could benefit exactly similiar optimizations). Let's set aside the fact that the people who optimized the implementation of dot() (the authors of ATLAS or the MKL or whichever optimized BLAS library you linked to) are different from those who implemented sum() (the numpy devs). Let me repeat a reason why one would put a lot of effort into optimizing dot() but not sum(): However, they are frequently worth it because those operations are often bottlenecks in whole applications. sum(), even in its stupidest implementation, rarely is. I don't know if I'm just not communicating very clearly, or if you just reply to individual statements before reading the whole email. You are communicating very well. It's me who's not communicating so well. and cause substantial headaches for people building and installing numpy. I appreciate this. No doubt at all. However, they are frequently worth it because those operations are often bottlenecks in whole applications. sum(), even in its stupidest implementation, rarely is. In the places where it is a significant bottleneck, an ad hoc implementation in C or Cython or even FORTRAN for just that application is pretty easy to write. But here I have to disagree; I'll think that at least I (if not even the majority of numpy users) don't like (nor I'm be capable/ or have enough time/ resources) go to dwell such details. And you think we have the time and resources to do it for you? My intention was not to suggest anything like that. I'm sorry but I'll have to restate that it's quite reasonable to expect that sum outperforms dot in any case. You don't optimize a function just because you are capable of it. You optimize a function because it is taking up a significant portion of total runtime in your real application. Anything else is a waste of time. Lets now to start make such movements, which enables sum to outperform dot. Sorry, you don't get to volunteer anyone's time or set anyone's priorities but your own. There are some sensible things one could do to optimize sums or even general ufunc reductions, as outlined my Mark, Pauli and myself, but please stop using the accelerated-BLAS dot() as your benchmark. It's a completely inappropriate comparison. OK. Lets compare sum to itself: In []: M= randn(1e5, 1e2) In []: timeit M.sum(0) 10 loops, best of 3: 169 ms per loop In []: timeit M.sum(1) 10 loops, best of 3: 37.5 ms per loop In []: timeit M.sum() 10 loops, best of 3: 18.1 ms per loop In []: timeit sum(M.sum(0)) 10 loops, best of 3: 169 ms per loop In []: timeit sum(M.sum(1)) 10 loops, best of 3: 37.7 ms per loop In []: timeit M.T.sum(0) 10 loops, best of 3: 37.7 ms per loop In []: timeit M.T.sum(1) 10 loops, best of 3: 171 ms per loop In []: timeit M.T.sum() 1 loops, best of 3: 288 ms per loop In []: timeit sum(M.T.sum(0)) 10 loops, best of 3: 37.7 ms per loop In []: timeit sum(M.T.sum(1)) 10 loops, best of 3: 170 ms per loop In []: 288./ 18.1 Out[]: 15.91160220994475 You can gain speed by specializing to just your use case, e.g. contiguous data, summing down to one number, or summing along one axis of only 2D data, etc. There's usually no reason to try to generalize that implementation to put it back into numpy. Yes, I would really like to specialize into my case, but 'without going out the python realm.' The Bottleneck project is a good place for such things. It's a nice middle-ground for somewhat-specialized routines that are still pretty common but not general enough to be in numpy yet. http
Re: [Numpy-discussion] Vectorize or rewrite function to work with array inputs?
Hi David, (I think the discussion has divergegd, but newer mind ;-) I didn't treat far as a vector (yet, but for sure it's doable). Anyway befor going on more details I'll suggest you*ll study my new attachment. As far I can see, you just have 'simple' polynomial evaluatons, which I have refactored such a way that you just need to provide the coefficients arrays. There is no need to write explicitly polynomial evaluation in the code. My opinion is that the computational side (if your functions really are like shown sofar) will be pretty straightforward. Then it boils down how to handle the coefficient arrays reasonable (perhaps some kind of lightweigt 'database' for them ;-). Please feel free to provide any more information. Regards, eat On Tue, Feb 1, 2011 at 10:20 PM, dpar...@chromalloy.com wrote: I'm not sure I need to dive into cython or C for this - performance is not an issue for my problem - I just want a flexible function that will accept scalars or arrays. Both Sebastian's and eat's suggestions show using indexing to handle the conditional statements in the original function. The problem I'm having implementing this is in getting the input arguments and outputs to a common array size. Here's how I can do this but it seems ugly: # t and far are function arguments which may be scalars or arrays # ag is the output array # need to make everything array with common length t = np.array(t, ndmin=1)# Convert t to an array far = np.array(far, ndmin=1)# Convert far to an array ag = t*far*np.nan# Make an output array of the proper length using broadcasting rules t = np.zeros_like(ag)+t# Expand t to the length of the output array far = np.zeros_like(ag)+far# Expand far to the length of the output array Now with all arrays the same length I can use indexing with logical statements: ag[far0.005] = -3.472487e-22 * t ** 6. + 6.218811e-18 * t ** 5. - 4.428098e-14 * t ** 4. + \ 1.569889e-10 * t ** 3. - 0.002753524 * t ** 2. + 0.0001684666 * t + 1.368652 The resulting code looks like this: import numpy as np def air_gamma_dp(t, far=0.0): Specific heat ratio (gamma) of Air/JP8 t - static temperature, Rankine [far] - fuel air ratio [- defaults to 0.0 (dry air)] air_gamma - specific heat ratio t = np.array(t, ndmin=1) far = np.array(far, ndmin=1) ag = t*far*np.nan t = np.zeros_like(ag)+t far = np.zeros_like(ag)+far far[(far0.) | (far0.069)] = np.nan t[(t 379.) | (t 4731.)] = np.nan ag[(far0.005)] = -3.472487e-22 * t ** 6. + 6.218811e-18 * t ** 5. - 4.428098e-14 * t ** 4. + 1.569889e-10 * t ** 3. - 0.002753524 * t ** 2. + 0.0001684666 * t + 1.368652 t[(t 699.) | (t 4731.)] = np.nan a6 = 4.114808e-20 * far ** 3. - 1.644588e-20 * far ** 2. + 3.103507e-21 * far - 3.391308e-22 a5 = -6.819015e-16 * far ** 3. + 2.773945e-16 * far ** 2. - 5.469399e-17 * far + 6.058125e-18 a4 = 4.684637e-12 * far ** 3. - 1.887227e-12 * far ** 2. + 3.865306e-13 * far - 4.302534e-14 a3 = -0.0001700602 * far ** 3. + 0.6593809 * far ** 2. - 0.1392629 * far + 1.520583e-10 a2 = 0.3431136 * far ** 3. - 0.1248285 * far ** 2. + 0.02688007 * far - 0.002651616 a1 = -0.03792449 * far ** 3. + 0.01261025 * far ** 2. - 0.002676877 * far + 0.0001580424 a0 = 13.65379 * far ** 3. - 3.311225 * far ** 2. + 0.3573201 * far + 1.372714 ag[far=0.005] = a6 * t ** 6. + a5 * t ** 5. + a4 * t ** 4. + a3 * t ** 3. + a2 * t ** 2. + a1 * t + a0 return ag I was hoping there was a more elegant way to do this. David Parker Chromalloy - TDAG From:John Salvatier jsalv...@u.washington.edu To:Discussion of Numerical Python numpy-discussion@scipy.org Date:02/01/2011 02:29 PM Subject:Re: [Numpy-discussion] Vectorize or rewrite function to work with array inputs? Sent by:numpy-discussion-boun...@scipy.org -- Have you thought about using cython to work with the numpy C-API (* http://wiki.cython.org/tutorials/numpy#UsingtheNumpyCAPI*http://wiki.cython.org/tutorials/numpy#UsingtheNumpyCAPI)? This will be fast, simple (you can mix and match Python and Cython). As for your specific issue: you can simply cast to all the inputs to numpy arrays (using asarray * http://docs.scipy.org/doc/numpy/reference/generated/numpy.asarray.html*http://docs.scipy.org/doc/numpy/reference/generated/numpy.asarray.html) to deal with scalars. This will make sure they get broadcast correctly. On Tue, Feb 1, 2011 at 11:22 AM, *dpar...@chromalloy.com*dpar...@chromalloy.com wrote: Thanks for the advice. Using Sebastian's advice I was able to write a version that worked when the input arguments are both arrays with the same length. The code provided by eat works when t is an array
Re: [Numpy-discussion] Vectorize or rewrite function to work with array inputs?
Hi, On Mon, Jan 31, 2011 at 5:15 PM, dpar...@chromalloy.com wrote: I have several functions like the example below that I would like to make compatible with array inputs. The problem is the conditional statements give a *ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()*. I can use numpy.vectorize, but if possible I'd prefer to rewrite the function. Does anyone have any advice the best way to modify the code to accept array inputs? Thanks in advance for any assistance. If I understod your question correctly, then air_gamma could be coded as: def air_gamma_0(t, far=0.0): Specific heat ratio (gamma) of Air/JP8 t - static temperature, Rankine [far] - fuel air ratio [- defaults to 0.0 (dry air)] air_gamma - specific heat ratio if far 0.: return NAN elif far 0.005: ag= air_gamma_1(t) ag[np.logical_or(t 379., t 4731.)]= NAN return ag elif far 0.069: ag= air_gamma_2(t, far) ag[np.logical_or(t 699., t 4731.)]= NAN return ag else: return NAN Rest of the code is in the attachment. My two cents, eat NAN = float('nan') def air_gamma(t, far=0.0): Specific heat ratio (gamma) of Air/JP8 t - static temperature, Rankine [far] - fuel air ratio [- defaults to 0.0 (dry air)] air_gamma - specific heat ratio if far 0.: return NAN elif far 0.005: if t 379. or t 4731.: return NAN else: air_gamma = -3.472487e-22 * t ** 6. + 6.218811e-18 * t ** 5. - 4.428098e-14 * t ** 4. + 1.569889e-10 * t ** 3. - 0.002753524 * t ** 2. + 0.0001684666 * t + 1.368652 elif far 0.069: if t 699. or t 4731.: return NAN else: a6 = 4.114808e-20 * far ** 3. - 1.644588e-20 * far ** 2. + 3.103507e-21 * far - 3.391308e-22 a5 = -6.819015e-16 * far ** 3. + 2.773945e-16 * far ** 2. - 5.469399e-17 * far + 6.058125e-18 a4 = 4.684637e-12 * far ** 3. - 1.887227e-12 * far ** 2. + 3.865306e-13 * far - 4.302534e-14 a3 = -0.0001700602 * far ** 3. + 0.6593809 * far ** 2. - 0.1392629 * far + 1.520583e-10 a2 = 0.3431136 * far ** 3. - 0.1248285 * far ** 2. + 0.02688007 * far - 0.002651616 a1 = -0.03792449 * far ** 3. + 0.01261025 * far ** 2. - 0.002676877 * far + 0.0001580424 a0 = 13.65379 * far ** 3. - 3.311225 * far ** 2. + 0.3573201 * far + 1.372714 air_gamma = a6 * t ** 6. + a5 * t ** 5. + a4 * t ** 4. + a3 * t ** 3. + a2 * t ** 2. + a1 * t + a0 elif far = 0.069: return NAN else: return NAN return air_gamma David Parker Chromalloy - TDAG ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion air_gamma.py Description: Binary data ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Help in speeding up accumulation in a matrix
Hi, On Sat, Jan 29, 2011 at 11:01 PM, Nicolas SCHEFFER scheffer.nico...@gmail.com wrote: Hi all, First email to the list for me, I just want to say how grateful I am to have python+numpy+ipython etc... for my day to day needs. Great combination of software. Anyway, I've been having this bottleneck in one my algorithms that has been bugging me for quite a while. The objective is to speed this part up. I've been doing tons of optimization and parallel processing around that piece of code to get a decent run time. The problem is easy. You want to accumulate in a matrix, a weighted sum of other matrices. Let's call this function scale_and_add: def scale_and_add_re(R,w,Ms): (nb_add,mdim,j)=np.shape(Ms) for i in range(nb_add): R+=w[i]*Ms[i] return R This 'for' loop bugs me since I know this will slow things down. But the dimension of these things are so large that any attempt to vectorize this is slower and takes too much memory. I typically work with 1000 weights and matrices, matrices of dimension of several hundred. My current config is: In [2]: %timeit scale_and_add_re(R,w,Ms) 1 loops, best of 3: 392 ms per loop In [3]: w.shape Out[3]: (2000,) In [4]: Ms.shape Out[4]: (2000, 200, 200) I'd like to be able to double these dimensions. How this array Ms is created? Do you really need to have it in the memory as whole? Assuming it's created by (200, 200) 'chunks' at a time, then you could accumulate that right away to R. It still involves Python looping but that's not so much overhead. My 2 cents eat For instance I could use broadcasting by using a dot product %timeit dot(Ms.T,w) 1 loops, best of 3: 1.77 s per loop But this is i) slower ii) takes too much memory (btw, I'd really need an inplace dot-product in numpy to avoid the copy in memory, like the blas call in scipy.linalg. But that's for an other thread...) The matrices are squared and symmetric. I should be able to get something out of this, but I never found anything related to this in Numpy. I also tried a Cython reimplementation %timeit scale_and_add_reg(L1,w,Ms) 1 loops, best of 3: 393 ms per loop It brought nothing in speed. Here's the code @cython.boundscheck(False) def scale_and_add_reg(np.ndarray[float, ndim=2] R, np.ndarray[float, ndim=1] w, np.ndarray[float, ndim=3] Ms): return _scale_and_add_reg(R,w,Ms) @cython.boundscheck(False) cdef int _scale_and_add_reg(np.ndarray[float, ndim=2] R, np.ndarray[float, ndim=1] w, np.ndarray[float, ndim=3] Ms): cdef unsigned int mdim cdef int nb_add (nb_add,mdim,j)=np.shape(Ms) cdef unsigned int i for i from 0 = i nb_add: R+=w[i]*Ms[i] #for j in range(mdim): #for k in range(mdim): #R[j][k]+=w[i]*Ms[i][j][k] return 0 So here's my question if someone has time to answer it: Did I try anything possible? Should I move along and deal with this bottleneck? Or is there something else I didn't think of? Thanks for reading, keep up the great work! -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] tril, triu, document/ implementation conflict
Hi, I just noticed a document/ implementation conflict with tril and triu. According tril documentation it should return of same shape and data-type as called. But this is not the case at least with dtype bool. The input shape is referred as (M, N) in tril and triu, but as (N, M) in tri. Inconsistent? Also I'm not very happy with the performance, at least dtype bool can be accelerated as follows. In []: M= ones((2000, 3000), dtype= bool) In []: timeit triu(M) 10 loops, best of 3: 173 ms per loop In []: timeit triu_(M) 10 loops, best of 3: 107 ms per loop In []: M= asarray(M, dtype= int) In []: timeit triu(M) 10 loops, best of 3: 160 ms per loop In []: timeit triu_(M) 10 loops, best of 3: 163 ms per loop In []: M= asarray(M, dtype= float) In []: timeit triu(M) 10 loops, best of 3: 195 ms per loop In []: timeit triu_(M) 10 loops, best of 3: 157 ms per loop I have attached a crude 'fix' incase someone is interested. Regards, eat twodim_base_fix.py Description: Binary data ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] tril, triu, document/ implementation conflict
Hi, On Wed, Jan 26, 2011 at 2:35 PM, josef.p...@gmail.com wrote: On Wed, Jan 26, 2011 at 7:22 AM, eat e.antero.ta...@gmail.com wrote: Hi, I just noticed a document/ implementation conflict with tril and triu. According tril documentation it should return of same shape and data-type as called. But this is not the case at least with dtype bool. The input shape is referred as (M, N) in tril and triu, but as (N, M) in tri. Inconsistent? Any comments about the names for rows and cols. I prefer (M, N). Also I'm not very happy with the performance, at least dtype bool can be accelerated as follows. In []: M= ones((2000, 3000), dtype= bool) In []: timeit triu(M) 10 loops, best of 3: 173 ms per loop In []: timeit triu_(M) 10 loops, best of 3: 107 ms per loop In []: M= asarray(M, dtype= int) In []: timeit triu(M) 10 loops, best of 3: 160 ms per loop In []: timeit triu_(M) 10 loops, best of 3: 163 ms per loop In []: M= asarray(M, dtype= float) In []: timeit triu(M) 10 loops, best of 3: 195 ms per loop In []: timeit triu_(M) 10 loops, best of 3: 157 ms per loop I have attached a crude 'fix' incase someone is interested. You could open a ticket for this. just one comment: I don't think this is readable, especially if we only look at the source of the function with np.source out= mul(ge(so(ar(m.shape[0]), ar(m.shape[1])), -k), m) from np.source(np.tri) with numpy 1.5.1 m = greater_equal(subtract.outer(arange(N), arange(M)),-k) I agree, thats why I called it crude. Before opening a ticket I'll try to figure out if there exists somewhere in numpy .astype functionality, but not copying if allready proper dtype. Also I'm afraid that I can't produce sufficient testing. Regards, eat Josef Regards, eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] How to improve performance of slow tri*_indices calculations?
Hi, Running on: In []: np.__version__ Out[]: '1.5.1' In []: sys.version Out[]: '2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]' For the reference: In []: X= randn(10, 125) In []: timeit dot(X.T, X) 1 loops, best of 3: 170 us per loop In []: X= randn(10, 250) In []: timeit dot(X.T, X) 1000 loops, best of 3: 671 us per loop In []: X= randn(10, 500) In []: timeit dot(X.T, X) 100 loops, best of 3: 5.15 ms per loop In []: X= randn(10, 1000) In []: timeit dot(X.T, X) 100 loops, best of 3: 20 ms per loop In []: X= randn(10, 2000) In []: timeit dot(X.T, X) 10 loops, best of 3: 80.7 ms per loop Performance of triu_indices: In []: timeit triu_indices(125) 1000 loops, best of 3: 662 us per loop In []: timeit triu_indices(250) 100 loops, best of 3: 2.55 ms per loop In []: timeit triu_indices(500) 100 loops, best of 3: 15 ms per loop In []: timeit triu_indices(1000) 10 loops, best of 3: 59.8 ms per loop In []: timeit triu_indices(2000) 1 loops, best of 3: 239 ms per loop So the tri*_indices calculations seems to be unreasonable slow compared to for example calculations of inner products. Now, just to compare for a very naive implementation of triu indices. In []: def iut(n): ..: r= np.empty(n* (n+ 1)/ 2, dtype= int) ..: c= r.copy() ..: a= np.arange(n) ..: m= 0 ..: for i in xrange(n): ..: ni= n- i ..: mni= m+ ni ..: r[m: mni]= i ..: c[m: mni]= a[i: n] ..: m+= ni ..: return (r, c) ..: Are we really calculating the same thing? In []: triu_indices(5) Out[]: (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) In []: iut(5) Out[]: (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) Seems so, and then its performance: In []: timeit iut(125) 1000 loops, best of 3: 992 us per loop In []: timeit iut(250) 100 loops, best of 3: 2.03 ms per loop In []: timeit iut(500) 100 loops, best of 3: 5.3 ms per loop In []: timeit iut(1000) 100 loops, best of 3: 13.9 ms per loop In []: timeit iut(2000) 10 loops, best of 3: 39.8 ms per loop Even the naive implementation is very slow, but allready outperforms triu_indices, when n is 250! So finally my question is how one could substantially improve the performance of indices calculations? Regards, eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Crosstabulation
Ionut Sandric sandricionut at yahoo.com writes: Thank you Zack: By raster data I mean classified slope gradient (derived from a dem), landuse-landcover, lithology etc. A crosstabulation analysis will give me a table with the common areas for each class from each raster and this will go into other analysis. I can do it with other softwares (like ArcGIS DEsktop etc), but I would like to have all with numpy or to build something on top of numpy Thanks's again Ionut - Original Message - From: Zachary Pincus zachary.pincus at yale.edu To: Discussion of Numerical Python numpy-discussion at scipy.org Sent: Wednesday, July 14, 2010 9:42:49 PM GMT +02:00 Athens, Beirut, Bucharest, Istanbul Subject: Re: [Numpy-discussion] Crosstabulation Hi Ionut, Check out the tabular package: http://parsemydata.com/tabular/index.html It seems to be basically what you want... it does pivot tables (aka crosstabulation), it's built on top of numpy, and has simple data IO tools. Also check out this discussion on pivot tables from the numpy list a while ago: http://mail.scipy.org/pipermail/numpy-discussion/2007-August/028739.html One question -- what do you mean by raster data? In my arena, that usually means images... and I'm not sure what a crosstabulation on image data would mean! Zach On Jul 14, 2010, at 2:28 PM, Ionut Sandric wrote: Sorry, the first email was sent before to finish it... Hi: I have two raster data and I would like to do a crosstabulation between them and export the results to a table in a text file. Is it possible to do it with NumPy? Does someone have an example? Thank you, Ionut ___ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Hi, You may be able to adapt this simple script to your case. import numpy as np # generate some test data def gen(b, e, m, n): return np.arange(b, e+ 1), np.random.randint(b, e+ 1, (m, n)) m, n= 15, 15 c1, d1= gen(0, 3, m, n); print d1 c2, d2= gen(3, 5, m, n); print d2 # perform actual x-tabulation xtab= np.zeros((len(c1), len(c2)), np.int) for i in xrange(len(c1)): tmp= d2[c1[i]== d1] for j in xrange(len(c2)): xtab[i, j]= np.sum(c2[j]== tmp) print xtab, np.sum(xtab)== np.prod(d1.shape) Anyway it's straightforward to extend it to nd x-tabulations ;-). My 2 cents, eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Find indices of largest elements
Nikolaus Rath Nikolaus at rath.org writes: [snip] Not quite, because I'm interested in the n largest values over all elements, not the largest element in each row or column. But Keith's solution seems to work fine, even though I'm still struggling to understand what's going on there . My bad. I just concentrated on your example, not the actual question. However, what's wrong with your above approach [ np.unravel_index(i, x.shape) for i in idx[-3:] ] ? Especially if your n largest elements are just a small fraction of all elements. # Note also the differencies a= np.asarray([[1, 8, 2], [2, 3, 3], [3, 4, 1]]) n= 3 # between print [np.unravel_index(ind, a.shape) for ind in np.argsort(a.ravel())[-n:]] # and print [np.where(val== a) for val in np.sort(a.ravel())[-n:]] Regards, eat Best, -Nikolaus ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Find indices of largest elements
Nikolaus Rath Nikolaus at rath.org writes: Hello, How do I best find out the indices of the largest x elements in an array? Example: a = [ [1,8,2], [2,1,3] ] magic_function(a, 2) == [ (0,1), (1,2) ] Since the largest 2 elements are at positions (0,1) and (1,2). Best, -Niko Hi, Just a= np.asarray([[1, 8, 2], [2, 1, 3]]) print np.where((a.T== a.max(axis= 1)).T) However, if any row contains more than 1 max entity, above will fail. Please let me to know if that's relevant for you. -eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Getting index of array after applying cond
Shailendra shailendra.vikas at gmail.com writes: I forgot to mention that i wanted this to work for general shape. So i modified it little bit x = array([[1,2,3,4,5], [6,7,8,7,6], [1,2,3,4,5]]) cond = (x 5) loc= where(cond) arg_max=argmax(x[cond]) x[tuple([e[arg_max] for e in loc])] 8 But what happens if your x is for example x = array([[1,2,3,4,5], [6,8,8,8,6], [1,2,3,4,5]]) x[tuple([e[arg_max] for e in loc])] would still yield to 8, which may or may not be an acceptable answer. Basically what I mean is that why to bother with the argmax at all, if your only interest is x[cond].max()? Just my 2 cents. Regards, eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Getting index of array after applying cond
Shailendra shailendra.vikas at gmail.com writes: Well, this is just a toy problem. argmax represent a method which will give me a index in x[cond] . And for the case of multiple value my requirement is fine with getting any max index. OK. My concern seems to be needless then. BTW, does this current thread relate anyway to the earlier one 'Match two arrays'? If so, would you like to elaborate more about your 'real' problem? Regards, eat Thanks, Shailendra ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] quot;Matchquot; two arrays
Shailendra shailendra.vikas at gmail.com writes: Hi All, I want to make a function which should be like this code cordinates1=(x1,y1) # x1 and y1 are x-cord and y-cord of a large number of points cordinates2=(x2,y2) # similar to condinates1 indices1,indices2= match_cordinates(cordinates1,cordinates2) code (x1[indices1],y1[indices1]) matches (x2[indices2],y2[indices2]) where definition of match is such that : If A is closest point to B and distance between A and B is less that delta than it is a match. If A is closest point to B and distance between A and B is more that delta than there is no match. Every point has either 1 match(closest point) or none This logic is problematic in general case. See below. You may need to be able to handle several pairs of 'closest points'! Also, the size of the cordinates1 and cordinates2 are quite large and outer should not be used. I can think of only C style code to achieve this. Can any one suggest pythonic way of doing this? Thanks, Shailendra This is straightforward implementation as a starting point. eat code import numpy as np def dist(p1, p2): return np.sqrt(np.sum((p1- p2)** 2, 0)) def cdist(p1, p2, trh): Expects 2d arrays p1 and p2, with combatible first dimesions and a threshold. Returns indicies of points close to each other -ind[:, 0], array of p1 indicies -ind[:, 1], array of p2 indicies -ambi, list of list of ambiquous situations (where more than 1 pair of points are 'equally close') The indicies are aranged such that dist(p1[:, ind[k, 0]], p2[:, ind[k, 1]]) trh is true for all k. ind= [] ambi= [] for k in range(p2.shape[1]): d= dist(p1, p2[:, None, k]) i= np.where(d trh)[0] if 0 len(i): m= np.where(d[i]== d[i].min())[0] # problematic i= i[m].tolist() ind.append([i[0], k]) if 1 len(m): ambi.append([ind[-1], i]) return np.array(ind), ambi if __name__ == '__main__': n= 10 trh= 2e-1 p1= np.round(np.random.rand(2, n), 1) p2= np.round(p1+ 1e-1* np.random.randn(2, n), 1) ind, ambi= cdist(p1, p2, trh) print 'points close to each other:' if 0 len(ind): print 'p1:' print p1[:, ind[:, 0]], ind[:, 0] print 'p2:' print p2[:, ind[:, 1]], ind[:, 1] print 'valid:' print dist(p1[:, ind[:, 0]], p2[:, ind[:, 1]]) trh print 'with ambiguous situation(s):' if ambi: print ambi else: print 'None' else: print 'None' import timeit n= 1e2 trh= 2e-1 rep= 5 p1= np.random.rand(2, 1e3* n) p2= np.random.randn(2, n) def perf(): ind, ambi= cdist(p1, p2, trh) print 'performance:' t= np.array(timeit.repeat(perf, repeat= rep, number= 1))/ rep print 'min: ', t.min(), 'mean: ', t.mean(), 'max: ', t.max() code ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] amp;quot;Matchamp;quot; two arrays
Oops. Wrongly timed. t= np.array(timeit.repeat(perf, repeat= rep, number= 1))/ rep should be t= np.array(timeit.repeat(perf, repeat= rep, number= 1)) eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion