Re: [Numpy-discussion] the difference between + and np.add?
On 11/23/12 8:00 PM, Chris Barker - NOAA Federal wrote: On Thu, Nov 22, 2012 at 6:20 AM, Francesc Alted franc...@continuum.io wrote: As Nathaniel said, there is not a difference in terms of *what* is computed. However, the methods that you suggested actually differ on *how* they are computed, and that has dramatic effects on the time used. For example: In []: arr1, arr2, arr3, arr4, arr5 = [np.arange(1e7) for x in range(5)] In []: %time arr1 + arr2 + arr3 + arr4 + arr5 CPU times: user 0.05 s, sys: 0.10 s, total: 0.14 s Wall time: 0.15 s There are also ways to minimize the size of temporaries, and numexpr is one of the simplests: but you can also use np.add (and friends) to reduce the number of temporaries. It can make a difference: In [11]: def add_5_arrays(arr1, arr2, arr3, arr4, arr5): : result = arr1 + arr2 : np.add(result, arr3, out=result) : np.add(result, arr4, out=result) : np.add(result, arr5, out=result) In [13]: timeit arr1 + arr2 + arr3 + arr4 + arr5 1 loops, best of 3: 528 ms per loop In [17]: timeit add_5_arrays(arr1, arr2, arr3, arr4, arr5) 1 loops, best of 3: 293 ms per loop (don't have numexpr on this machine for a comparison) Yes, you are right. However, numexpr still can beat this: In [8]: timeit arr1 + arr2 + arr3 + arr4 + arr5 10 loops, best of 3: 138 ms per loop In [9]: timeit add_5_arrays(arr1, arr2, arr3, arr4, arr5) 10 loops, best of 3: 74.3 ms per loop In [10]: timeit ne.evaluate(arr1 + arr2 + arr3 + arr4 + arr5) 10 loops, best of 3: 20.8 ms per loop The reason is that numexpr is multithreaded (using 6 cores above), and for memory-bounded problems like this one, fetching data in different threads is more efficient than using a single thread: In [12]: timeit arr1.copy() 10 loops, best of 3: 41 ms per loop In [13]: ne.set_num_threads(1) Out[13]: 6 In [14]: timeit ne.evaluate(arr1) 10 loops, best of 3: 30.7 ms per loop In [15]: ne.set_num_threads(6) Out[15]: 1 In [16]: timeit ne.evaluate(arr1) 100 loops, best of 3: 13.4 ms per loop I.e., the joy of multi-threading is that it not only buys you CPU speed, but can also bring your data from memory faster. So yeah, modern applications *do* need multi-threading for getting good performance. -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Conditional update of recarray field
Hi, I try to update values in a single field of numpy record array based on a condition defined in another array. I found that that the result depends on the order in which I apply the boolean indices/field names. For example: cond = np.zeros(5, dtype=np.bool) cond[2:] = True X = np.rec.fromarrays([np.arange(5)], names='a') X[cond]['a'] = -1 print X returns: [(0,) (1,) (2,) (3,) (4,)] (the values were not updated) X['a'][cond] = -1 print X returns: [(0,) (1,) (-1,) (-1,) (-1,)] (it worked this time). I find this behaviour very confusing. Is it expected? Would it be possible to emit a warning message in the case of faulty assignments? Bartosz ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Conditional update of recarray field
On 11/28/12 1:47 PM, Bartosz wrote: Hi, I try to update values in a single field of numpy record array based on a condition defined in another array. I found that that the result depends on the order in which I apply the boolean indices/field names. For example: cond = np.zeros(5, dtype=np.bool) cond[2:] = True X = np.rec.fromarrays([np.arange(5)], names='a') X[cond]['a'] = -1 print X returns: [(0,) (1,) (2,) (3,) (4,)] (the values were not updated) X['a'][cond] = -1 print X returns: [(0,) (1,) (-1,) (-1,) (-1,)] (it worked this time). I find this behaviour very confusing. Is it expected? Yes, it is. In the first idiom, X[cond] is a fancy indexing operation and the result is not a view, so what you are doing is basically modifying the temporary object that results from the indexing. In the second idiom, X['a'] is returning a *view* of the original object, so this is why it works. Would it be possible to emit a warning message in the case of faulty assignments? The only solution that I can see for this is that the fancy indexing would return a view, and not a different object, but NumPy containers are not prepared for this. -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Conditional update of recarray field
Thanks for answer, Francesc. I understand now that fancy indexing returns a copy of a recarray. Is it also true for standard ndarrays? If so, I do not understand why X['a'][cond]=-1 should work. Cheers, Bartosz On Wed 28 Nov 2012 03:05:37 PM CET, Francesc Alted wrote: On 11/28/12 1:47 PM, Bartosz wrote: Hi, I try to update values in a single field of numpy record array based on a condition defined in another array. I found that that the result depends on the order in which I apply the boolean indices/field names. For example: cond = np.zeros(5, dtype=np.bool) cond[2:] = True X = np.rec.fromarrays([np.arange(5)], names='a') X[cond]['a'] = -1 print X returns: [(0,) (1,) (2,) (3,) (4,)] (the values were not updated) X['a'][cond] = -1 print X returns: [(0,) (1,) (-1,) (-1,) (-1,)] (it worked this time). I find this behaviour very confusing. Is it expected? Yes, it is. In the first idiom, X[cond] is a fancy indexing operation and the result is not a view, so what you are doing is basically modifying the temporary object that results from the indexing. In the second idiom, X['a'] is returning a *view* of the original object, so this is why it works. Would it be possible to emit a warning message in the case of faulty assignments? The only solution that I can see for this is that the fancy indexing would return a view, and not a different object, but NumPy containers are not prepared for this. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Conditional update of recarray field
Hey Bartosz, On 11/28/12 3:26 PM, Bartosz wrote: Thanks for answer, Francesc. I understand now that fancy indexing returns a copy of a recarray. Is it also true for standard ndarrays? If so, I do not understand why X['a'][cond]=-1 should work. Yes, that's a good question. No, in this case the boolean array `cond` is passed to the __setitem__() of the original view, so this is why this works. The first idiom is concatenating the fancy indexing with another indexing operation, and NumPy needs to create a temporary for executing this, so the second indexing operation acts over a copy, not a view. And yes, fancy indexing returning a copy is standard for all ndarrays. Hope it is clearer now (although admittedly it is a bit strange at first sight), -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Conditional update of recarray field
I got it. Thanks! Now I see why this is non-trivial to fix it. However, it might be also a source of very-hard-to-find bugs. It might be worth discussing this non-intuitive example in the documentation. Cheers, Bartosz Thanks for answer, Francesc. I understand now that fancy indexing returns a copy of a recarray. Is it also true for standard ndarrays? If so, I do not understand why X['a'][cond]=-1 should work. Yes, that's a good question. No, in this case the boolean array `cond` is passed to the __setitem__() of the original view, so this is why this works. The first idiom is concatenating the fancy indexing with another indexing operation, and NumPy needs to create a temporary for executing this, so the second indexing operation acts over a copy, not a view. And yes, fancy indexing returning a copy is standard for all ndarrays. Hope it is clearer now (although admittedly it is a bit strange at first sight), ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] result shape from dot for 0d, 1d, 2d scalar
On Wed, 2012-11-28 at 11:11 -0500, Skipper Seabold wrote: On Tue, Nov 27, 2012 at 11:16 AM, Sebastian Berg sebast...@sipsolutions.net wrote: On Mon, 2012-11-26 at 13:54 -0500, Skipper Seabold wrote: I discovered this because scipy.optimize.fmin_powell appears to squeeze 1d argmin to 0d unlike the other optimizers, but that's a different story. I would expect the 0d array to behave like the 1d array not the 2d as it does below. Thoughts? Maybe too big of a pain to change this behavior if indeed it's not desired, but I found it to be unexpected. I don't quite understand why it is unexpected. A 1-d array is considered a vector, a 0-d array is a scalar. When you put it like this I guess it makes sense. I don't encounter 0d arrays often and never think of a 0d array as truly a scalar like np.array(1.).item(). See below for my intuition. I think you should see them as a scalar though for mathematical operations. The differences are fine in any case, and numpy typically silently converts scalars - 0d arrays on function calls and back again to return scalars. snip Maybe I'm misunderstanding. How do you mean there is no broadcasting? Broadcasting adds dimensions to the start. To handle a vector like a matrix product in dot, you do not always add the dimension at the start. For matrix.vector the vector (N,) is much like (N,1). Also the result of dot is not necessarily 2-d which it should be in your reasoning and if you think about what happens in broadcasting terms. They're clearly not conformable. Is vector.scalar specially defined (I have no idea)? I recall arguing once and submitting a patch such that np.linalg.det(5) and np.linalg.inv(5) should be well-defined and work but the counter-argument was that a scalar is not the same as a scalar matrix. This seems to be an exception. I do not see an exception, in all cases there is no implicit (broadcasting like) adding of extra dimensions (leading to an error in most linear algebra functions if the input is not 2-d) which is good since explicit is better then implicit. Here, I guess, following that counterargument, I'd expected the scalar to fail in dot. I certainly don't expect a (N,2).scalar - (N,2). Or If you say dot is strictly a matrix product yes (though it should also throw errors for vectors then). I think it simply is trying to be more like the dot that I would write down on paper and thus special cases vectors and scalars and this generalization only replaces what should otherwise be an error in a matrix product! Maybe a strict matrix product would make sense too, but the dot function behavior cannot be changed in any case, so its pointless to argue about it. Just make sure your arrays are 2-d (or matrices) if you want a matrix product, which will give the behavior you expect in a much more controlled fashion anyway. I'd expect it to follow the rules of matrix notation and be treated like the 1d scalar vector so that (N,1).scalar - (N,). To my mind, this follows more closely to the expectation that (J,K).(M,N) - (J,N), i.e., the second dimension of the result is the same as the second dimension of whatever is post-multiplying where the first dimension is inferred if necessary (or should fail if non-existent). So my expectations are (were) (N,).() - (N,) (N,1).() - (N,) (N,1).(1,) - (N,) (N,1).(1,1) - (N,1) (N,2).() - Error Skipper [~] [279]: arr = np.random.random((25,2)) [~/] [280]: np.dot(arr.squeeze(), np.array(2.)).shape [280]: (25, 2) Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] result shape from dot for 0d, 1d, 2d scalar
On Wed, Nov 28, 2012 at 12:31 PM, Sebastian Berg sebast...@sipsolutions.net wrote: Maybe a strict matrix product would make sense too, but the dot function behavior cannot be changed in any case, so its pointless to argue about it. Just make sure your arrays are 2-d (or matrices) if you want a matrix product, which will give the behavior you expect in a much more controlled fashion anyway. I'm not arguing anything. I was just stating why I was surprised and was looking for guidance to update my expectations, which you've provided. Thanks. Assuring input dimensions is my solution. Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Windows installation problem
I have tried to install the 1.6.2 win32 superpack on my Windows 7 Pro (64 bit) system which has ActiveState ActivePython 2.7.2.5 (64 bit) installed. However, I get an error that Python 2.7 is required and can't be found in the Registry. I only need numpy as it is a pre-requisite for another package and numpy is the only pre-requisite that won't install. Other are specific to Python 2.7 and they install. Some are Win64 and some are Win32. Is there a work around for this? I have no facilities available to build numpy. Regards, Jim ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Windows installation problem
On Wed, Nov 28, 2012 at 10:16 PM, Jim O'Brien j...@jgssebl.net wrote: ** I have tried to install the 1.6.2 win32 superpack on my Windows 7 Pro (64 bit) system which has ActiveState ActivePython 2.7.2.5 (64 bit) installed. However, I get an error that Python 2.7 is required and can't be found in the Registry. I only need numpy as it is a pre-requisite for another package and numpy is the only pre-requisite that won't install. Other are specific to Python 2.7 and they install. Some are Win64 and some are Win32. Is there a work around for this? You need to use 64-bit numpy if you have 64-bit Python. You can find one at http://www.lfd.uci.edu/~gohlke/pythonlibs/. Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Windows installation problem
Ralf, Thanks. I downloaded the 1.6.2 release for win64 and tried to install. I am still being told that it requires 2.7 and that was not found in the registry. I know I have Python 2.7 as other packages find it just fine. Is there a way to get around the check that is done by the installer? Regards, Jim _ From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Ralf Gommers Sent: Wed, Nov 28, 2012 2:32 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Windows installation problem On Wed, Nov 28, 2012 at 10:16 PM, Jim O'Brien j...@jgssebl.net wrote: I have tried to install the 1.6.2 win32 superpack on my Windows 7 Pro (64 bit) system which has ActiveState ActivePython 2.7.2.5 (64 bit) installed. However, I get an error that Python 2.7 is required and can't be found in the Registry. I only need numpy as it is a pre-requisite for another package and numpy is the only pre-requisite that won't install. Other are specific to Python 2.7 and they install. Some are Win64 and some are Win32. Is there a work around for this? You need to use 64-bit numpy if you have 64-bit Python. You can find one at http://www.lfd.uci.edu/~gohlke/pythonlibs/. Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Windows installation problem
Forget the last post. I was one the wrong machine! The 64 bit release installed fine. Regards, Jim _ From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Ralf Gommers Sent: Wed, Nov 28, 2012 2:32 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Windows installation problem On Wed, Nov 28, 2012 at 10:16 PM, Jim O'Brien j...@jgssebl.net wrote: I have tried to install the 1.6.2 win32 superpack on my Windows 7 Pro (64 bit) system which has ActiveState ActivePython 2.7.2.5 (64 bit) installed. However, I get an error that Python 2.7 is required and can't be found in the Registry. I only need numpy as it is a pre-requisite for another package and numpy is the only pre-requisite that won't install. Other are specific to Python 2.7 and they install. Some are Win64 and some are Win32. Is there a work around for this? You need to use 64-bit numpy if you have 64-bit Python. You can find one at http://www.lfd.uci.edu/~gohlke/pythonlibs/. Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Simple Loadtxt question
I have a file with thousands of lines like this: Signal was returned in 204 microseconds Signal was returned in 184 microseconds Signal was returned in 199 microseconds Signal was returned in 4274 microseconds Signal was returned in 202 microseconds Signal was returned in 189 microseconds I try to read it like this: data = np.loadtxt('dummy.data', dtype={'names':('label','times','musec'), 'fmts':('|S23','i8','|S13')}) It fails, I think, because it wants a string format and field for each of the words 'Signal' 'was' 'returned' etc. Can I make it treat that whole string before the number as one string, one field? All I really care about is the numbers anyway. Any advice appreciated. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple Loadtxt question
On 29.11.2012, at 1:21AM, Robert Love wrote: I have a file with thousands of lines like this: Signal was returned in 204 microseconds Signal was returned in 184 microseconds Signal was returned in 199 microseconds Signal was returned in 4274 microseconds Signal was returned in 202 microseconds Signal was returned in 189 microseconds I try to read it like this: data = np.loadtxt('dummy.data', dtype={'names':('label','times','musec'), 'fmts':('|S23','i8','|S13')}) It fails, I think, because it wants a string format and field for each of the words 'Signal' 'was' 'returned' etc. Can I make it treat that whole string before the number as one string, one field? All I really care about is the numbers anyway. Then how about np.loadtxt('dummy.data', usecols=(4, )) Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion