Re: [Numpy-discussion] Topical documentation
2008/6/26 Stéfan van der Walt [EMAIL PROTECTED]: Hi all, We are busy writing several documents that address general NumPy topics, such as indexing, broadcasting, testing, etc. I would like for those documents to be available inside of NumPy, so that they could be accessed as docstrings: help(np.doc.indexing) or In [15]: np.doc.indexing? If I had to implement it right now, I'd probably - Have a directory full of topic.txt files. - Upon importing the doc sub-module, setup doc.topic and add docstring accordingly. Does anyone have comments or suggestions regarding such a framework? I think making them available inside np.doc is a good idea. Should np.doc.indexing contain anything but the single docstring? It's also worth making some of the examples work as doctests, both to get better testing - some of the indexing operations don't work, or at least, don't do anything I can understand - and to keep the examples in syn with the code. To make this happen, might it not be better to create doc/indexing.py, containing __doc__=...? The topical documentation would presumably also be ReST. Any further conventions it should follow? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ndarray methods vs numpy module functions
2008/6/24 Bob Dowling [EMAIL PROTECTED]: There is not supposed to be a one-to-one correspondence between the functions in numpy and the methods on an ndarray. There is some duplication between the two, but that is not a reason to make more duplication. I would make a plea for consistency, to start with. Those of us who write in an OO style are required to switch backwards and forwards between OO and not-OO, or to abandon OO altogether in our NumPy code. Neither is an attractive option. The reason I tripped over this is that I am currently writing a course which introduces students to NumPy. I am going to be asked this question from the audience. As yet I don't have any answer except history. As a rule, I personally avoid methods whenever reasonable. The primary reason is that the function versions generally work fine on lists, giving my functions some extra genericity for free. I generally make an exception only for attributes - X.shape, for example, rather than np.shape(X). It's true, it's a mess, but I'd just set down some simple rules that work, and mention that some functions exist in other forms. There's something to be said for rationalizing numpy's rules - or at least writing them down! - but there's no need to use every version of every function. And I believe they're all accessible as module-level functions. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Code samples in docstrings mistaken as doctests
2008/6/23 Alan McIntyre [EMAIL PROTECTED]: On Mon, Jun 23, 2008 at 3:21 PM, Stéfan van der Walt [EMAIL PROTECTED] wrote: Another alternative is to replace +SKIP with something like +IGNORE. That way, the statement is still executed, we just don't care about its outcome. If we skip the line entirely, it often affects the rest of the tests later on. Ugh. That just seems like a lot of unreadable ugliness to me. If this comment magic is the only way to make that stuff execute properly under doctest, I think I'd rather just skip it in favor of clean, uncluttered, non-doctestable code samples in the docstrings. If the code that's currently in docstrings needs to be retained as test code, I'll gladly take the time to put it into a test_ module where it doesn't get in the way of documentation. I'll defer to the consensus, though. I think doctests are valuable: it's very hard for the documentation to get out of sync with the code, and it makes it very easy to write tests, particularly in light of the wiki documentation framework. But I think encrusting examples with weird comments will be a pain for documentors and off-putting to users. Perhaps doctests can be positively marked, in some relatively unobtrusive way? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Code samples in docstrings mistaken as doctests
2008/6/23 Michael McNeil Forbes [EMAIL PROTECTED]: On 23 Jun 2008, at 12:37 PM, Alan McIntyre wrote: Ugh. That just seems like a lot of unreadable ugliness to me. If this comment magic is the only way to make that stuff execute properly under doctest, I think I'd rather just skip it in favor of clean, uncluttered, non-doctestable code samples in the docstrings. Another perspective: doctests ensure that documentation stays up to date (if the behaviour or interface changes, then tests will fail indicating that the documentation also needs to be updated.) Thus, one can argue that all examples should also be doctests. This generally makes things a little more ugly, but much less ambiguous. This is a bit awkward. How do you give an example for a random-number generator? Even if you are willing to include a seed in each statement, misleading users into thinking it's necessary, the value returned for a given seed is not necessarily part of the interface a random-number generator agrees to support. I do agree that as many examples as possible should be doctests, but I don't think we should restrict the examples we are allowed to give to only those that can be made to serve as doctests. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Code samples in docstrings mistaken as doctests
2008/6/23 Stéfan van der Walt [EMAIL PROTECTED]: 2008/6/24 Stéfan van der Walt [EMAIL PROTECTED]: It should be fairly easy to execute the example code, just to make sure it runs. We can always work out a scheme to test its validity later. Mike Hansen just explained to me that the Sage doctest system sets the random seed before executing each test. If we address a) Random variables b) Plotting representations and c) Endianness we're probably halfway there. I agree (though I have reservations about how they are to be addressed). But in the current setting, halfway there is still a problem - it seems to me we need, now and later, a way to deal with generic examples that are not doctests. There may not be many of them, and most may be dealt with by falling into categories a, b, and c above, but it is important that we not make it difficult to write new examples even if they can't readily be made into doctests. In particular, we don't want some documentor saying well, I'd like to write an example, but I don't remember the arcane syntax to prevent this failing a doctest, so I'm not going to bother. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Code samples in docstrings mistaken as doctests
2008/6/23 Michael Abshoff [EMAIL PROTECTED]: Charles R Harris wrote: On Mon, Jun 23, 2008 at 5:58 PM, Michael Abshoff [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Stéfan van der Walt wrote: 2008/6/24 Stéfan van der Walt [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]: It should be fairly easy to execute the example code, just to make sure it runs. We can always work out a scheme to test its validity later. Hi, Mike Hansen just explained to me that the Sage doctest system sets the random seed before executing each test. If we address a) Random variables we have some small extensions to the doctesting framework that allow us to mark doctests as #random so that the result it not checked. Carl Witty wrote some code that makes the random number generator in a lot of the Sage components behave consistently on all supported platforms. Hi, But there is more than one possible random number generator. If you do that you are tied into one kind of generator and one kind of initialization implementation. Chuck Correct, but so far Carl has hooked into six out of the many random number generators in the various components of Sage. This way we can set a global seed and also more easily reproduce issues with algorithms where randomness plays a role without being forced to be on the same platform. There are still doctests in Sage where the randomness comes from sources not in randgen (Carl's code), but sooner or later we will get around to all of them. Doesn't this mean you can't change your implementation of random number generators (for example choosing a different implementation of generation of normally-distributed random numbers, or replacing the Mersenne Twister) without causing countless doctests to fail meaninglessly? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On my Cython/NumPy project
2008/6/21 Dag Sverre Seljebotn [EMAIL PROTECTED]: Dag wrote: General feedback is welcome; in particular, I need more opinions about what syntax people would like. We seem unable to find something that we really like; this is the current best candidate (cdef is the way you declare types on variables in Cython): cdef int i = 4, j = 6 cdef np.ndarray[np.float64, 2] arr = np.zeros((10, 10), dtype=np.float64) arr[i, j] = 1 ... Some more important points: - There will likely be a compiler flag (and perhaps per-block or per-variable pragmas) on whether bounds-checking is done or not) - It doesn't look like negative indices will be supported in this mode, as it adds another branch in potentially very tight loops. Opinions on this are welcome though. In safe mode, bounds checking will catch it, while in unsafe mode it will at best segfault and at worst corrupt data. The negative indices thing is potentially confusing to new users. Pex uses a different syntax (arr{i, j} for efficient array lookups) partly to make this fact very explicit. Thoughts? I am very worried about the negative numbers issue. It's the sort of thing that will readily lead to errors, and that produces a significant difference between cython and python. I understand the performance issues that motivate it, but cython really needs to be easy to use or we might as well just use C. As I understand it, the ultimate goal should be for users to be able to compile arbitrary python code for tiny speed improvements, then add type annotations for key variables to obtain large speed improvements. Forbidding negative indices now prevents that goal. My suggestion is this: allow negative indices, accepting the cost in tight loops. (If bounds checking is enabled, the cost will be negligible anyway.) Provide a #pragma allowing the user to assert that a certain piece of code uses no negative indices. Ultimately, the cython compiler will often be able to deduce than negative indices will not be used within a particular loop and supply the pragma itself, but in the meantime this solution allows people to use the extremely-convenient negative indices without causing mysterious problems, and it sets things up for the future. (Other compiler-based cleverness can deal with the most common case by converting explicit negative indices.) Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On my Cython/NumPy project
2008/6/21 Robert Kern [EMAIL PROTECTED]: On Sat, Jun 21, 2008 at 17:08, Anne Archibald [EMAIL PROTECTED] wrote: My suggestion is this: allow negative indices, accepting the cost in tight loops. (If bounds checking is enabled, the cost will be negligible anyway.) Provide a #pragma allowing the user to assert that a certain piece of code uses no negative indices. Instead of a #pragma, you could rely on the type of the index. If it is unsigned, you can do the fast path; if it is signed, you need to check for and handle potential negatives. Cute! And then it's easy to make for i in range(n) produce unsigned results with no effort on the part of the user. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Strange behavior for argsort() and take()
2008/6/18 Charles R Harris [EMAIL PROTECTED]: On Wed, Jun 18, 2008 at 11:48 AM, Anne Archibald [EMAIL PROTECTED] wrote: 2008/6/18 Stéfan van der Walt [EMAIL PROTECTED]: 2008/6/18 Anne Archibald [EMAIL PROTECTED]: In [7]: x.take(x.argsort()) Out[7]: array([ 0. , 0.1, 0.2, 0.3]) If you would like to think of it more mathematically, when you feed np.argsort() an array that represents a permutation of the numbers 0,1,...,n-1, you get back the inverse permutation. When you pass a permutation as the argument to x.take(), you apply the permutation to x. (You can also invert a permutation by feeding it as indices to arange(n)). I have been tempted to write some support functions for manipulating permutations, but I'm not sure how generally useful they would be. Do we have an easy way of grabbing the elements out of an array that correspond to argmax(x, axis) ? `take` won't do the job; there `axis` has a different meaning. Well, here's a quick hack: def take_axis(X, ix, axis): XX = np.rollaxis(X,axis) s = XX.shape return XX[(ix,)+tuple(np.indices(s[1:]))] And a version that works for argsort()ing: def take_axis_argsort(X, ix, axis): XX = np.rollaxis(X,axis) ixix = np.rollaxis(ix,axis) s = XX.shape return np.rollaxis(XX[(ixix,)+tuple(np.indices(s)[1:])],0,axis+1) I'm always a little bit leery of using indices(), since it can produce very large arrays: And fancy indexing too, as it tends to be slow. The axis=None case also needs to be handled. I think this is best done in C where the code stacks for argsort and arg{max,min} are good starting points. The possibility of different integer types for the index array adds a bit of complication. Actually, apart from the sizes of the index arrays, I don't really think fancy indexing can be beat. I suspect that one could write tuple_indices that used broadcasting to get the index arrays, if you were willing to return a tuple. That would bring these up to nearly C speed, I think. Of course they're not general routines, but I think with a bit of work they might serve. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Find groups of true values or sequence of values
2008/6/18 bevan [EMAIL PROTECTED]: Hello, I am looking for some pointers that will hopefully get me a round an issue I have hit. I have a timeseries of river flow and would like to carry out some analysis on the recession periods. That is anytime the values are decreasing. I would like to restrict (mask) my data to any values that are in a continuous sequence of 3 or more (in the example below), that are decreasing in value. Hopefully this example helps: import numpy as np Flow = np.array([15.4,20.5,19.4,18.7,18.6,35.5,34.8,25.1,26.7]) FlowDiff = np.diff(Flow) boolFlowdiff = FlowDiff0 MaskFlow = np.ma.masked_array(Flow[1:],boolFlowdiff) print MaskFlow [-- 19.4 18.7 18.6 -- 34.8 25.1 --] The output I would like is [-- 19.4 18.7 18.6 -- -- -- --] Where the second groups is blanked because the sequence only has 2 members. I would tackle this in steps: find the decreasing pairs, then find places where two of them occur in a row, then construct your mask. Finding decreases: d = diff(X)0 finding two decreases in a row: t = d[1:] d[:-1] creating the right mask: m = np.zeros(n,dtype=np.bool) m[:n-2] |= t m[1:n-1] |= t m[2:n] |= t For finding longer runs you would want other tricks. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Strange behavior for argsort() and take()
2008/6/18 Thomas J. Duck [EMAIL PROTECTED]: I have found what I think is some strange behavior for argsort () and take(). First, here is an example that works as expected: x = numpy.array([1,0,3,2]) x.argsort() array([1, 0, 3, 2]) argsort() returns the original array, which is self-indexing for numbers 0 through 3... x.take(x.argsort()) array([0, 1, 2, 3]) ...and take() provides the correct sort with ascending order. Now check out what happens when we swap the middle two numbers: x = numpy.array([1,3,0,2]) x.argsort() array([2, 0, 3, 1]) argsort() appears to have indexed the numbers in descending order (which was not expected) x.take(x.argsort()) array([0, 1, 2, 3]) ... but the take() operation puts them in ascending order anyway (which was really not expected)! Can anyone shed some light on what is happening here for me? I have tested this on both my OS X/Fink and Debian/Lenny systems with the same result. This is the intended behaviour. The array returned by np.argsort() is designed to be used as an index into the original array (take() does almost the same thing as indexing). In particular, when x.argsort() returns array([2,0,3,1]), that means that the sorted array is x[2], x[0], x[3], x[1]. It might be clearer to sort a different array: In [3]: x = np.array([0.1,0.3,0.0,0.2]) In [5]: x.argsort() Out[5]: array([2, 0, 3, 1]) In [7]: x.take(x.argsort()) Out[7]: array([ 0. , 0.1, 0.2, 0.3]) If you would like to think of it more mathematically, when you feed np.argsort() an array that represents a permutation of the numbers 0,1,...,n-1, you get back the inverse permutation. When you pass a permutation as the argument to x.take(), you apply the permutation to x. (You can also invert a permutation by feeding it as indices to arange(n)). I have been tempted to write some support functions for manipulating permutations, but I'm not sure how generally useful they would be. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nose changes checked in
2008/6/17 Alan McIntyre [EMAIL PROTECTED]: On Tue, Jun 17, 2008 at 8:15 PM, Anne Archibald [EMAIL PROTECTED] wrote: Uh, I assumed NumpyTestCase was public and used it. I'm presumably not alone, so perhaps a deprecation warning would be good. What backward-compatible class should I use? unittest.TestCase? Yes, unittest.TestCase seemed to be completely adequate for all the existing tests in NumPy. Unless you need some functionality explicitly implemented in NumpyTestCase (like the 'measure' or 'rundocs' methods), you can just replace it with TestCase. TestCase can be imported from numpy.testing, although (now that I think about it) it's probably less cryptic to just import it from unittest. For module-level doctests, you can place something like this in the module: from numpy.testing import * def test(): return rundocs() You can replace ParametricTest with generators, as described here: http://scipy.org/scipy/scipy/wiki/TestingGuidelines Hmm. This won't work with the current version of numpy, will it? That is, it needs nose. (I run much code on the work machines, which I (thankfully) do not administer, and which are running a variety of ancient versions of numpy (some even have only Numeric, to support horrible quasi-in-house C extensions).) So I'd like to write my tests in a way that they can run on both new and old versions of numpy. I will document all this more completely once the test setup isn't changing on a daily basis, honest. I'm assuming this experience should tell me that any item bar that I can get by issuing from numpy.foo import bar should be considered the public API and therefore deprecated instead of removed? ;) Well, probably. But more so for those that are used widely throughout numpy itself, since many of us learn how to write code using numpy by reading numpy source. (Yes, this means that internal conventions like numpy.core.whatever get used by people who aren't writing numpy.) Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Strange behavior for argsort() and take()
2008/6/18 Stéfan van der Walt [EMAIL PROTECTED]: 2008/6/18 Anne Archibald [EMAIL PROTECTED]: In [7]: x.take(x.argsort()) Out[7]: array([ 0. , 0.1, 0.2, 0.3]) If you would like to think of it more mathematically, when you feed np.argsort() an array that represents a permutation of the numbers 0,1,...,n-1, you get back the inverse permutation. When you pass a permutation as the argument to x.take(), you apply the permutation to x. (You can also invert a permutation by feeding it as indices to arange(n)). I have been tempted to write some support functions for manipulating permutations, but I'm not sure how generally useful they would be. Do we have an easy way of grabbing the elements out of an array that correspond to argmax(x, axis) ? `take` won't do the job; there `axis` has a different meaning. Well, here's a quick hack: def take_axis(X, ix, axis): XX = np.rollaxis(X,axis) s = XX.shape return XX[(ix,)+tuple(np.indices(s[1:]))] And a version that works for argsort()ing: def take_axis_argsort(X, ix, axis): XX = np.rollaxis(X,axis) ixix = np.rollaxis(ix,axis) s = XX.shape return np.rollaxis(XX[(ixix,)+tuple(np.indices(s)[1:])],0,axis+1) I'm always a little bit leery of using indices(), since it can produce very large arrays: In [60]: np.indices((2,3,4)).shape Out[60]: (3, 2, 3, 4) Anne P.S. rollaxis() saves so very much pain; I do wonder if unrollaxis would be useful, even though it's just def unrollaxis(X, axis): return rollaxis(X, 0, axis+1) -A ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nose changes checked in
2008/6/18 Stéfan van der Walt [EMAIL PROTECTED]: 2008/6/18 Anne Archibald [EMAIL PROTECTED]: Well, probably. But more so for those that are used widely throughout numpy itself, since many of us learn how to write code using numpy by reading numpy source. (Yes, this means that internal conventions like numpy.core.whatever get used by people who aren't writing numpy.) People shouldn't be using code from `numpy.core` directly. When we refactor code behind the scenes, we need a workspace to do it in, and if people start putting their hands in the engine we can't protect them from getting hurt. A bigger warning sign may be appropriate (or, to take it to the extreme, rename `core` to `_core`). I'm not arguing users should be using numpy.core (and I don't use it), but given the nature of numpy and its documentation, many users look into the code, see numpy.core and copy that. I think unless we want to actually expose sub-namespaces - which was actively discussed - we probably should put an underscore. Not that we can without breaking the code of all those people who are already using numpy.core. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nose changes checked in
2008/6/17 Alan McIntyre [EMAIL PROTECTED]: On Tue, Jun 17, 2008 at 9:26 AM, David Huard [EMAIL PROTECTED] wrote: I noticed that NumpyTest and NumpyTestCase disappeared, and now I am wondering whether these classes part of the public interface or were they reserved for internal usage ? In the former, it might be well to deprecate them before removing them. ParametricTestCase is gone too. There was at least one person using it that said he didn't mind porting to the nose equivalent, but I expect that's an indication there's more people out there using them. If there's a consensus that they need to go back in and get marked as deprecated, I'll put them back. Uh, I assumed NumpyTestCase was public and used it. I'm presumably not alone, so perhaps a deprecation warning would be good. What backward-compatible class should I use? unittest.TestCase? Thanks, Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Polyfit
2008/6/16 Chandler Latour [EMAIL PROTECTED]: I believe I'm bound to python. In terms of forcing the regression through the origin, the purpose is partly for visualization but it also should fit the data. It would not make sense to model the data with an initial value other than 0. Polyfit is just a thin wrapper around numpy.linalg.lstsq. You can do something like: A = np.power.outer(xs,np.arange(1,n)) outputs = np.linalg.lstsq(A,ys) Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Large symmetrical matrix
2008/6/10 Simon Palmer [EMAIL PROTECTED]: Hi I have a problem which involves the creation of a large square matrix which is zero across its diagonal and symmetrical about the diagonal i.e. m[i,j] = m[j,i] and m[i,i] = 0. So, in fact, it is a large triangular matrix. I was wondering whether there is any way of easily handling a matrix of this shape without either incurring a memory penalty or a whole whack of proprietary code? To get through this I have implemented a 1D array which has ((n-1)^2)/2 elements inside a wrapper class which manpulates the arguments of array accessors with some arithmetic to return the approriate value. To be honest I'd love to throw this away, but I haven't yet come across a feasible alternative. Any ideas? Well, there's a risk of shooting yourself in the foot, but here's a way to get at the elements quickly and conveniently: store the coefficients in a 1D array (as you do already). Then if your matrix is M by M, element (i,j) is at position (M-2)*j+i-1, assuming ij. Of course, you can simply use this expression every time you want to look an element up, as you do now, but that's no fun. So now create a view A of this array having offset -1, shape (M,M), and strides (1,(M-2)). (This kind of weird view can be created with the ndarray constructor.) Then A[i,j] addresses the right element. A[j,i] is of course totally wrong. But as long as you avoid ij, you can use all numpy's magical indexing tricks. Unfortunately you can't really use ufuncs, since they'll waste time operating on the useless lower half. Personally, I would live with the factor-of-two memory hit. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Coverting ranks to a Gaussian
2008/6/9 Keith Goodman [EMAIL PROTECTED]: Does anyone have a function that converts ranks into a Gaussian? I have an array x: import numpy as np x = np.random.rand(5) I rank it: x = x.argsort().argsort() x_ranked = x.argsort().argsort() x_ranked array([3, 1, 4, 2, 0]) I would like to convert the ranks to a Gaussian without using scipy. So instead of the equal distance between ranks in array x, I would like the distance been them to follow a Gaussian distribution. How far out in the tails of the Gaussian should 0 and N-1 (N=5 in the example above) be? Ideally, or arbitrarily, the areas under the Gaussian to the left of 0 (and the right of N-1) should be 1/N or 1/2N. Something like that. Or a fixed value is good too. I'm actually not clear on what you need. If what you need is for rank i of N to be the 100*i/N th percentile in a Gaussian distribution, then you should indeed use scipy's functions to accomplish that; I'd use scipy.stats.norm.ppf(). Of course, if your points were drawn from a Gaussian distribution, they wouldn't be exactly 1/N apart, there would be some distribution. Quite what the distribution of (say) the maximum or the median of N points drawn from a Gaussian is, I can't say, though people have looked at it. But if you want typical values, just generate N points from a Gaussian and sort them: V = np.random.randn(N) V = np.sort(V) return V[ranks] Of course they will be different every time, but the distribution will be right. Anne P.S. why the no scipy restriction? it's a bit unreasonable. -A ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Inplace shift
2008/6/7 Keith Goodman [EMAIL PROTECTED]: On Fri, Jun 6, 2008 at 10:46 PM, Anne Archibald [EMAIL PROTECTED] wrote: 2008/6/6 Keith Goodman [EMAIL PROTECTED]: I'd like to shift the columns of a 2d array one column to the right. Is there a way to do that without making a copy? This doesn't work: import numpy as np x = np.random.rand(2,3) x[:,1:] = x[:,:-1] x array([[ 0.44789223, 0.44789223, 0.44789223], [ 0.80600897, 0.80600897, 0.80600897]]) As a workaround you can use backwards slices: worki In [40]: x = np.random.rand(2,3) In [41]: x[:,:0:-1] = x[:,-2::-1] In [42]: x Out[42]: array([[ 0.20183084, 0.20183084, 0.08156887], [ 0.30611585, 0.30611585, 0.79001577]]) Neat. It makes sense to go backwards. Thank you. Less painful for numpy developers but more painful for users is to warn them about the status quo: operations on overlapping slices can happen in arbitrary order. Now I'm confused. Could some corner case of memory layout cause numpy to work from right to left, breaking the workaround? Or can I depend on the workaround working with numpy 1.0.4? I'm afraid so. And it's not such a corner case as that: if the array is laid out in C contiguous order, you have to go backwards, while if the array is laid out in FORTRAN contiguous order you have to go forwards. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] array vs matrix
2008/6/7 Robert Kern [EMAIL PROTECTED]: On Sat, Jun 7, 2008 at 14:37, Ondrej Certik [EMAIL PROTECTED] wrote: Hi, what is the current plan with array and matrix with regard of calculating sin(A) ? I.e. elementwise vs sin(A) = Q*sin(D)*Q^T? Is the current approach (elementwise for array and Q*sin(D)*Q^T for matrix) the way to go? I don't believe we intend to make numpy.matrix any more featureful. I don't think it's a good idea for you to base sympy.Matrix's capabilities in lock-step with numpy.matrix, though. There are very different constraints at work. Please, do what you think is best for sympy.Matrix. Let me elaborate somewhat: We recently ran across some painful quirks in numpy's handling of matrices, and they spawned massive discussion. As it stands now, there is significant interest in reimplementing the matrix object from scratch, with different behaviour. So emulating its current behaviour is not a win. For consistency, it makes a certain amount of sense to have sin(A) compute a matrix sine, since A**n computes a matrix power. But looking at the matrix exponential, I see that we have several implementations, none of which is satisfactory for all matrices. I would expect the matrix sine to be similar - particularly when faced with complex matrices - so perhaps needing an explicit matrix sine is a good thing? Also worth noting is that this problem can be evaded with namespaces; matrix sin could be scipy.matrixmath.sin, abbreviated perhaps to mm.sin, as opposed to np.sin. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A memory problem: why does mmap come up in numpy.inner?
2008/6/4 Dan Yamins [EMAIL PROTECTED]: So, I have three questions about this: 1) Why is mmap being called in the first place? I've written to Travis Oliphant, and he's explained that numpy.inner does NOT directly do any memory mapping and shouldn't call mmap. Instead, it should just operate with things in memory -- in which case my 8 GB should allow the computation to go through just fine. What's going on? 2) How can I stop this from happening? I want to be able to leverage large amounts of ram on my machine to scale up my computations and not be dependent on the limitations of the address space size. If the mmap is somehow being called by the OS, is there some option I can set that will make it do things in regular memory instead? (Sorry if this is a stupid question.) I don't know much about OSX, but I do know that many malloc() implementations take advantage of a modern operating system's virtual memory when allocating large blocks of memory. For small blocks, malloc uses memory arenas, but if you ask for a large block malloc() will request a whole bunch of pages from the operating system. This way when the memory is freed, free() can easily return the chunk of memory to the OS. On some systems, one way to get such a big hunk of memory from the system is with an anonymous mmap(). I think that's what's going on here. So I don't think you want to stop malloc() from using mmap(). You do, of course, want the memory allocation to succeed, and I'm afraid I don't have any idea why it can't. Under Linux, I know that you can run a 64-bit processor in 32-bit mode, which gives you the usual 4 GB address space limit. I've no idea what OSX does. Good luck, Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A memory problem: why does mmap come up in numpy.inner?
2008/6/4 Dan Yamins [EMAIL PROTECTED]: Anne, thanks so much for your help. I still a little confused. If your scenario about the the memory allocation is working is right, does that mean that even if I put a lot of ram on the machine, e.g. 16GB, I still can't request it in blocks larger than the limit imposed by the processor architecture (e.g. 4 GB for 32, 8 GB for 64-bit)?What I really want is to be able to have my ability to request memory limited just by the amount of memory on the machine, and not have it depend on something about paging/memory mapping limits. Is this a stupid/naive thing? No, that's a perfectly reasonable thing to want, and it *should* be possible with your hardware. Maybe I'm barking up the wrong tree and the problem is something else. But if you'll bear with me: For the last ten years or so, normal computers have used 32 bits to address memory. This means that any given process can only get at 2**32 different bytes of memory, which is 4 GB. The magic of virtual memory means you can possibly have a number of different processes addressing different 4 GB chunks, some of which may temporarily be stored on disk at any given time. If you want to get at a chunk of memory bigger than 4 GB, you need all of these things: * Your CPU must be 64-bit, or else it has no hope of being able to access more than 4 GB (in fact more than 3 GB for complicated reasons) of physical memory. * Your operating system must be 64-bit (64-bit PC CPUs have a 32-bit compatibility mode they often find themselves running in) so that it can know about all the real RAM you have installed and so that it can support 64-bit programs. * Your program must be 64-bit, so that it can access huge chunks of RAM. For compatibility reasons, many 64-bit machines have most of their software compiled in 32-bit mode. This isn't necessarily a problem - the programs probably even run faster - but they can't access more than 4 GB each. So it's worth finding out whether your python interpreter is compiled as a 64-bit program (and whether that's even possible under your version of OSX). Unfortunately I don't know how to find that out in your case. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A memory problem: why does mmap come up in numpy.inner?
2008/6/4 Dan Yamins [EMAIL PROTECTED]: Try In [3]: numpy.dtype(numpy.uintp).itemsize Out[3]: 4 which is the size in bytes of the integer needed to hold a pointer. The output above is for 32 bit python/numpy. Chuck Check, the answer is 4, as you got for the 32-bit. What would the answer be on a 64-bit architecture? Why is this diagnostic? In a 64-bit setting, a pointer needs to be 64 bits long, that is, eight bytes, not four. What Charles pointed out was that while the inner product is very big, it seems to fit into memory on his 32-bit Linux machine; is it possible that OSX is preventing your python process from using even the meager 2-3 GB that a 32-bit process ought to get? In particular, try running Charles' script in a fresh python interpreter and see if it works; it may be that other arrays you had allocated are taking up some of the space that this one could. You will probably still want a 64-bit python, though, in order to have a little elbow room. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] linked variables?
2008/6/3 Robert Kern [EMAIL PROTECTED]: Python does not copy data when you assign something to a new variable. Python simply points the new name to the same object. If you modify the object using the new name, all of the other names pointing to that object will see the changes. If you want a copy, you will need to explicitly make one. For numpy arrays, the best way to do this is to use array(). m_i = array(m_o) Is array() really the best way to copy an array? I would have thought m_i = m_o.copy() would be better - requiring less of array's clever guessing, and preserving subtypes. On the downside it doesn't work if you're given a list, but perhaps np.copy() would be better (though it loses subtype information)? Certainly it's clearer. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] How does Boolean indexing work?
Hi, I thought it might be useful to summarize the different ways to use numpy's indexing, slice and fancy. The document so far is here: http://www.scipy.org/Cookbook/Indexing While writing it I ran into some puzzling issues. The first of them is, how is Boolean indexing supposed to work when given more than one array? What I mean is, suppose we have the following array: In [3]: A = np.arange(9) In [4]: B = np.reshape(A,(3,3)) In [5]: B Out[5]: array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) Then we can use boolean indexing to select some rows or some columns: In [12]: B[np.array([True,False,True]),:] Out[12]: array([[0, 1, 2], [6, 7, 8]]) In [13]: B[:,np.array([True,False,True])] Out[13]: array([[0, 2], [3, 5], [6, 8]]) But if you supply Boolean arrays in both row and column places, rather than select the corresponding rows and columns, you get something basically mysterious: In [14]: B[np.array([True,False,True]),np.array([True,True,False])] Out[14]: array([0, 7]) In [15]: B[np.array([True,False,True]),np.array([True,False,False])] Out[15]: array([0, 6]) In [16]: B[np.array([True,False,True]),np.array([True,True,True])] --- type 'exceptions.ValueError'Traceback (most recent call last) /home/peridot/ipython console in module() type 'exceptions.ValueError': shape mismatch: objects cannot be broadcast to a single shape What is going on here? Thanks, Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Slice assignment and overlapping views (was: Strange behavior in setting masked array values in Numpy 1.1.0)
2008/5/31 Pauli Virtanen [EMAIL PROTECTED]: The reason for the strange behavior of slice assignment is that when the left and right sides in a slice assignment are overlapping views of the same array, the result is currently effectively undefined. Same is true for ndarrays: import numpy a = numpy.array([1, 2, 3, 4, 5]) a[::-1] array([5, 4, 3, 2, 1]) a[:] = a[::-1] a array([5, 4, 3, 4, 5]) I think that the current rule is, slices are walked from low index to high index. This doesn't help with multidimensional arrays, where the order of the axes is (and should be) determined by efficiency considerations. Unfortunately there's really no good way to warn about overlapping copies. Remember that this is a frequent operation, so it has to be fast for small arrays. I think changing base so that it points to the real base and not the parent would help (and clear up a memory leak: try while True: A = A[::-1] some time) eliminate some cases where overlap cannot occur, but what about the following cases? A[:5] = A[-5:] A[::2] = A[1::2] A[1:] = A[-1:] The last is actually fairly common (I've needed it), and relies on numpy's ordering of copies. The middle one is very common, and the first one would be a royal pain to code around if the slices were not allowed to overlap. I think I at one point wrote an algorithm that would detect many cases where overlap could not occur (maybe in the smarter reshape code?) but I came to the conclusion that detecting all the cases was infeasible. It's a number-theoretic problem - can you write the same number as the sum of multiples of these two lists of numbers with nonnegative integer coefficients less than these other two lists of numbers? - and I suspect it's NP-complete. (Ah, yes, you can make a knapsack out of it - take an array with strides a_1, ... a_n and shape (2,...,2), and a single-element array starting at position N.) In any case it's infeasible to solve it every time somebody tries to do a slice assignment. In any case, many users need nearly-overlapping slices, and some need really-overlapping slices. Preventing problems is going to have to happen at a higher level. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] New documentation web application
2008/5/29 Jarrod Millman [EMAIL PROTECTED]: On Thu, May 29, 2008 at 3:28 PM, Stéfan van der Walt [EMAIL PROTECTED] wrote: The NumPy documentation project has taken another leap forward! Pauli Virtanen has, in a week of superhuman coding, produced a web application that enhances the work-flow and editing experience of NumPy docstrings on the web. Excellent. Thanks to everyone who is working on this. The documentation work that you are doing is going to make a huge difference in increasing NumPy adoption and will benefit the community in a major way. Absolutely. It is also, already, improving the reliability and consistency of numpy's code as people carefully go over all the functions that have been neglected for years. This is a great project. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposal on ufunc shift operators.
2008/5/28 Charles R Harris [EMAIL PROTECTED]: Hi All, Currently we have: In [2]: ones(1,dtype=int8) ones(1,dtype=uint8) Out[2]: array([2], dtype=int16) In [4]: ones(1,dtype=int64) ones(1,dtype=uint64) Out[4]: array([2], dtype=object) Note the increased size in the first case and the return of a Python long integer object in the second. I propose that these operators should preserve the type of the first argument, although this is not easy to do with the current ufunc setup. It is impossible to use a type of sufficient size for all shift values and preserving the type of the first argument is what I think most folks would expect. That sounds eminently sensible to me. Of course this will complicate explaining type rules for ufuncs. An alternative is to give it some reasonably simple non-exceptional rule and let users specify the output dtype if they care. Still, I think in the interest of minimal surprise, getting rid of the conversion to an object array of longs, at least, would be a good idea. Do we have a cyclic-shift ufunc? That is, uint8(1)8==1? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] List of function-like things with an 'out' parameter
2008/5/28 Alan McIntyre [EMAIL PROTECTED]: On Wed, May 28, 2008 at 3:34 PM, Charles R Harris I wonder if this is something that ought to be looked at for all functions with an out parameter? ndarray.compress also had problems with array type mismatch (#789); I can't imagine that it's safe to assume only these two functions were doing it incorrectly. (Unless of course somebody has recently looked at all of them) I think that is an excellent idea! A good start would be to list all the functions with the out parameter and then write some tests. The current behavior is inconsistent and we not only need to specify the behavior, but fix all the places that don't follow the rules. Here's a list of things in numpy that have an 'out' argument (and their arguments); I think I eliminated all the duplicated items (that are imported from a subpackage into one of the main packages, for example). There's stuff that's missing, probably; I think the C-implemented functions don't have argument lists programmatically available so I may parse their docstrings or something, but this is a start. Nice! One noticeable absence is all the ufuncs. (Partly this is because it's not actually called out, or on fact anything at all; it's just the last parameter if there are enough.) You might also check things like objects returned by vectorize() and frompyfunc(). Does it make sense to put this list on the Wiki somewhere, so that people who come across new things that take output parameters (however named) can post them? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Is this a bug?
2008/5/27 Robert Kern [EMAIL PROTECTED]: Can we make it so that dtype('c') is preserved instead of displaying '|S1'? It does not behave the same as dtype('|S1') although it compares equal to it. It seems alarming to me that they should compare equal but behave differently. Is it possible to change more than just the way it prints? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Buildbot errors.
2008/5/24 Jarrod Millman [EMAIL PROTECTED]: On Thu, May 22, 2008 at 11:25 PM, Charles R Harris [EMAIL PROTECTED] wrote: On Fri, May 23, 2008 at 12:10 AM, Charles R Harris [EMAIL PROTECTED] wrote: The python 2.6 buildbots are showing 5 failures that are being hidden by valgrind. They seem to have fixed themselves, they were probably related to the API addition I made, then moved. However, it is a bad thing that the errors were covered up by Valgrind and not reported. I didn't understand what you meant by this yesterday, but I see what you are talking about now: http://buildbot.scipy.org/builders/Linux_x86_64_Fedora/builds/486/steps/shell_2/logs/stdio http://buildbot.scipy.org/builders/Linux_x86_Fedora_Py2.6/builds/461/steps/shell_2/logs/stdio You said they fixed themselves, but the failures are in the most recent buildbot reports. This is the only thing I am concerned about before branching, so hopefully someone can look at this and let me know whether the failures are indeed fixed. They do not appear on my machine (a pentium-M running Ubuntu). I should point out that they are only actually three distinct errors, because one of the test suites is being run twice.They are not exactly subtle tests; they're checking that seterr induces the raising of exceptions from np.arange(3)/0, np.sqrt(-np.arange(3)), and np.array([1.])/np.array([0.]). The tests pass on the x86_64 machine I have access to (a multiprocessor Opteron running Knoppix of all things). That is, it only has python 2.4 (don't ask) so the tests can't be run, but running the same tests by hand produces the expected results. This particular feature - seterr - is the sort of thing an overaggressive optimizer can easily butcher, though, so it could easily be the result of the particular configuration on the buildbot machine. I think somebody with access to the buildbot machine needs to see what's going on. In particular: does a manually-compiled numpy exhibit the problem? How clean does the buildbot make its environment? Do the functions behave correctly from an interactive session? Do other seterr conditions have the same problem? Anne P.S. Please ignore the alarming-looking buildbot failure; it is due to operator headspace. -A ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Buildbot errors.
2008/5/24 Charles R Harris [EMAIL PROTECTED]: I take that back, I got confused looking through the output. The errors are the same and only seem to happen when valgrind runs the tests. Sounds like maybe valgrind is not IEEE clean: As of version 3.0.0, Valgrind has the following limitations in its implementation of x86/AMD64 floating point relative to IEEE754. + Numeric exceptions in FP code: IEEE754 defines five types of numeric exception that can happen: invalid operation (sqrt of negative number, etc), division by zero, overflow, underflow, inexact (loss of precision). For each exception, two courses of action are defined by IEEE754: either (1) a user-defined exception handler may be called, or (2) a default action is defined, which fixes things up and allows the computation to proceed without throwing an exception. Currently Valgrind only supports the default fixup actions. Again, feedback on the importance of exception support would be appreciated. When Valgrind detects that the program is trying to exceed any of these limitations (setting exception handlers, rounding mode, or precision control), it can print a message giving a traceback of where this has happened, and continue execution. This behaviour used to be the default, but the messages are annoying and so showing them is now disabled by default. Use --show-emwarns=yes to see them. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 1.1.x branch and 1.1.0 tag imminent
2008/5/24 Jarrod Millman [EMAIL PROTECTED]: On Fri, May 23, 2008 at 11:16 PM, David Cournapeau [EMAIL PROTECTED] wrote: Where do those errors appear ? I don't see them on the builbot. Are they 2.6 specific ? If yes, I would say ignore them, because 2.6 is not released yet, and is scheduled for september. You can see them here: http://buildbot.scipy.org/builders/Linux_x86_64_Fedora/builds/486/steps/shell_2/logs/stdio http://buildbot.scipy.org/builders/Linux_x86_Fedora_Py2.6/builds/461/steps/shell_2/logs/stdio You have to search for them, because as Chuck pointed out it seems valgrind seems to be hiding them. We should figure out how to avoid this at some point down the road as well. I am not sure whether or not this is 2.6 specific or not. No. These are a known limitation of valgrind when dealing with floating-point. Other than emailing the valgrind developers to tell them that yes, somebody cares about this, I think they can be safely ignored (thank goodness). Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] first recarray steps
2008/5/21 Vincent Schut [EMAIL PROTECTED]: Christopher Barker wrote: Also, if you image data is rgb, usually, that's a (width, height, 3) array: rgbrgbrgbrgb... in memory. If you have a (3, width, height) array, then that's rrr... Some image libs may give you that, I'm not sure. My data is. In fact, this is a simplification of my situation; I'm processing satellite data, which usually has more (and other) bands than just rgb. But the data is definitely in shape (bands, y, x). You may find your life becomes easier if you transpose the data in memory. This can make a big difference to efficiency. Years ago I was working with enormous (by the standards of the day) MATLAB files on disk, storing complex data. The way (that version of) MATLAB represented complex data was the way you describe: matrix of real parts, matrix of imaginary parts. This meant that to draw a single pixel, the disk needed to seek twice... depending on what sort of operations you're doing, transposing your data so that each pixel is all in one place may improve cache coherency as well as making the use of record arrays possible. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy/SciPy Documentation Marathon 2008
2008/5/21 LB [EMAIL PROTECTED]: This is really a great news, and it seems very promising according to the first pages of the Wiki that I've seen. It's perhaps not the right place to say this, but I was wondering what you would thinking about adding labels or category to the descriptions of each function ? I think It would really help newcomers to search around the 400 functions of numpy if they could use some labels to precise theirs thoughts. You're right that some kind of categorization would be very valuable. I think that this is planned, but the doc marathon organizers can presumably give a more detailed answer. Until now, the most efficient why of finding the numpy function that fits my needds what to search after some words in the numpy example list page. This was not always very fast and labels like array creation, shape manipulation,index operation, arithmetic, ...etc, could simplify this kind of search. In the short term, there's the numpy functions by category page on the wiki: http://www.scipy.org/Numpy_Functions_by_Category Of course, this page is already incomplete, and nobody is systematically updating it. Really the right solution is what you propose, annotating each function and then automatically generating such a list. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question on NumPy NaN
2008/5/20 Vasileios Gkinis [EMAIL PROTECTED]: I have a question concerning nan in NumPy. Lets say i have an array of sample measurements a = array((2,4,nan)) in NumPy calculating the mean of the elements in array a looks like: a = array((2,4,nan)) a array([ 2., 4., NaN]) mean(a) nan What if i simply dont want nan to propagate and get something that would look like: a = array((2,4,nan)) a array([ 2., 4., NaN]) mean(a) 3. For more elaborate handling of missing data, look into masked arrays, in numpy.ma. They are designed to deal with exactly this sort of thing. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] dimension aligment
2008/5/20 Thomas Hrabe [EMAIL PROTECTED]: given a 3d array a = numpy.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]],[[13,14,15],[16,17,18]],[[19,20,21],[22,23,24]]]) a.shape returns (4,2,3) so I assume the first digit is the 3rd dimension, second is 2nd dim and third is the first. how is the data aligned in memory now? according to the strides it should be 1,2,3,4,5,6,7,8,9,10,... right? if I had an array of more dimensions, the first digit returned by shape should always be the highest dim. You are basically right, but this is a surprisingly subtle issue for numpy. A numpy array is basically a block of memory and some description. One piece of that description is the type of data it contains (i.e., how to interpret each chunk of memory) for example int32, float64, etc. Another is the sizes of all the various dimensions. A third piece, which makes many of the things numpy does possible, is the strides. The way numpy works is that basically it translates A[i,j,k] into a lookup of the item in the memory block at position i*strides[0]+j*strides[1]+k*strides[2] This means, if you have an array A and you want every second element (A[::2]), all numpy needs to do is hand you back a new array pointing to the same data block, but with strides[0] doubled. Similarly if you want to transpose a two-dimensional array, all it needs to do is exchange strides[0] and strides[1]; no data need be moved. This means, though, that if you are handed a numpy array, the elements can be arranged in memory in quite a complicated fashion. Sometimes this is no problem - you can always use the strides to find it all. But sometimes you need the data arranged in a particular way. numpy defines two particular ways: C contiguous and FORTRAN contiguous. C contiguous arrays are what you describe, and they're what numpy produces by default; they are arranged so that the rightmost index has the smallest stride. FORTRAN contiguous arrays are arranged the other way around; the leftmost index has the smallest stride. (This is how FORTRAN arrays are arranged in memory.) There is also a special case: the reshape() function changes the shape of the array. It has an order argument that describes not how the elements are arranged in memory but how you want to think of the elements as arranged in memory for the reshape operation. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Slicing a numpy array and getting the complement
2008/5/19 Orest Kozyar [EMAIL PROTECTED]: Given a slice, such as s_[..., :-2:], is it possible to take the complement of this slice? Specifically, s_[..., ::-2]. I have a series of 2D arrays that I need to split into two subarrays via slicing where the members of the second array are all the members leftover from the slice. The problem is that the slice itself will vary, and could be anything such as s_[..., 1:4:] or s_[..., 1:-4:], etc, so I'm wondering if there's a straightforward idiom or routine in Numpy that would facilitate taking the complement of a slice? I've looked around the docs, and have not had much luck. If you are using boolean indexing, of course complements are easy (just use ~). But if you want slice indexing, so that you get views, sometimes the complement cannot be expressed as a slice: for example: A = np.arange(10) A[2:4] The complement of A[2:4] is np.concatenate((A[:2],A[4:])). Things become even more complicated if you start skipping elements. If you don't mind fancy indexing, you can convert your index arrays into boolean form: complement = A==A complement[idx] = False Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Slicing a numpy array and getting the complement
2008/5/19 Orest Kozyar [EMAIL PROTECTED]: If you don't mind fancy indexing, you can convert your index arrays into boolean form: complement = A==A complement[idx] = False This actually would work perfectly for my purposes. I don't really need super-fancy indexing. Heh. Actually fancy indexing is numpy-speak for indexing with anything that's not an integer or a slice. In this case, indexing with a boolean array is fancy indexing. The reason we make this distinction is that with slices, the new array you get is actually just a reference to the original array (so you can modify the original array through it). With fancy indexing, the new array you get is actually a copy. (Assigning to fancy-indexed arrays is handled specially in __setitem__, so it works.) Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quick Question about Optimization
2008/5/19 James Snyder [EMAIL PROTECTED]: First off, I know that optimization is evil, and I should make sure that everything works as expected prior to bothering with squeezing out extra performance, but the situation is that this particular block of code works, but it is about half as fast with numpy as in matlab, and I'm wondering if there's a better approach than what I'm doing. I have a chunk of code, below, that generally iterates over 2000 iterations, and the vectors that are being worked on at a given step generally have ~14000 elements in them. With arrays this size, I wouldn't worry about python overhead - things like range versus xrange or self lookups. Is there anything in practice here that could be done to speed this up? I'm looking more for general numpy usage tips, that I can use while writing further code and not so things that would be obscure or difficult to maintain in the future. Try using a profiler to find which steps are using most of your time. With such a simple function it may not be very informative, but it's worth a try. Also, the results of this are a binary array, I'm wondering if there's anything more compact for expressing than using 8 bits to represent each single bit. I've poked around, but I haven't come up with any clean and unhackish ideas :-) There's a tradeoff between compactness and speed here. The *fastest* is probably one boolean per 32-bit integer. It sounds awful, I know, but most modern CPUs have to work harder to access bytes individually than they do to access them four at a time. On the other hand, cache performance can make a huge difference, so compactness might actually amount to speed. I don't think numpy has a packed bit array data type (which is a shame, but would require substantial implementation effort). I can provide the rest of the code if needed, but it's basically just filling some vectors with random and empty data and initializing a few things. It would kind of help, since it would make it clearer what's a scalar and what's an array, and what the dimensions of the various arrays are. for n in range(0,time_milliseconds): self.u = self.expfac_m * self.prev_u + (1-self.expfac_m) * self.aff_input[n,:] self.v = self.u + self.sigma * np.random.standard_normal(size=(1,self.naff)) You can use scale to rescale the random numbers on creation; that'll save you a temporary. self.theta = self.expfac_theta * self.prev_theta - (1-self.expfac_theta) idx_spk = np.where(self.v=self.theta) You can probably skip the where; the result of the expression self.v=self.theta is a boolean array, which you can use directly for indexing. self.S[n,idx_spk] = 1 self.theta[idx_spk] = self.theta[idx_spk] + self.b += here might speed things up, not just in terms of temporaries but by saving a fancy-indexing operation. self.prev_u = self.u self.prev_theta = self.theta Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpify this?
2008/5/18 Matt Crane [EMAIL PROTECTED]: On Sun, May 18, 2008 at 8:52 PM, Robert Kern [EMAIL PROTECTED] wrote: Are there repeats? No, no repeats in the first column. I'm going to go get a cup of coffee before I forget to leave out any potentially vital information again. It's going to be a long day. It can be done, though I had to be kind of devious. My solution might not even be O(n log n), depending on how mergesort is implemented: def find_matches(A,B): Find positions A_idx and B_idx so that A[A_idx]==B[B_idx] A and B should be sorted arrays with no repeats; A_idx and B_idx will be all the places they agree. import numpy as np A = np.array([1,2,4,5,7]) B = np.array([1,3,5,9]) find_matches(A,B) array([0,3]), array([0,2]) AB = np.concatenate((A,B)) idx = np.argsort(np.concatenate((A,B)),kind=mergesort) sorted = AB[idx] pairs = np.where(np.diff(sorted)==0)[0] A_pairs = idx[pairs] B_pairs = idx[pairs+1]-len(A) if np.any(A_pairs=len(A)): raise ValueError, first array contains a repeated element if np.any(B_pairs0): raise ValueError, second array contains a repeated element return A_pairs, B_pairs The idea is that diff is not a bad way to find repeats in a sequence; so concatenate the two sequences and sort them (if you're lucky mergesort will be fast because they're individually sorted, but in any case we need stability). Of course, you have to take the pairs in the sorted sequence and figure out where they came from, but fortunately you don't even need to invert the permutation returned by argsort (do we have a tool to invert permutations, or should one just use argsort again?). Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] checking element types in array
2008/5/17 Zoho Vignochi [EMAIL PROTECTED]: hello: I am writing my own version of a dot product. Simple enough this way: def dot_r(a, b): return sum( x*y for (x,y) in izip(a, b) ) However if both a and b are complex we need: def dot_c(a, b): return sum( x*y for (x,y) in izip(a.conjugate(), b) ).real As you probably realize, this will be vastly slower (ten to a hundred times) than the built-in function dot() in numpy. I would like to combine these so that I need only one function which detects which formula based on argument types. So I thought that something like: def dot(a,b): if isinstance(a.any(), complex) and isinstance(b.any(), complex): return sum( x*y for (x,y) in izip(a.conjugate(), b) ).real else: return sum( x*y for (x,y) in izip(a, b) ) And it doesn't work because I obviously have the syntax for checking element types incorrect. So my real question is: What is the best way to check if any of the elements in an array are complex? numpy arrays are efficient, among other reasons, because they have homogeneous types. So all the elements in an array are the same type. (Yes, this means if you have an array of numbers only one of which happens to be complex, you have to represent them all as complex numbers whose imaginary part happens to be zero.) So if A is an array A.dtype is the type of its elements. numpy provides two convenience functions for checking whether an array is complex, depending on what you want: iscomplex checks whether each element has a nonzero imaginary part and returns an array representing the element-by-element answer; so any(iscomplex(A)) will be true if any element of A has a nonzero imaginary part. iscomplexobj checks whether the array has a complex data type. This is much much faster, but of course it may happen that all the imaginary parts happen to be zero; if you want to treat this array as real, you must use iscomplex. Anne Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about optimizing
2008/5/17 Charles R Harris [EMAIL PROTECTED]: On Sat, May 17, 2008 at 9:52 AM, Alan G Isaac [EMAIL PROTECTED] wrote: On Fri, 16 May 2008, Anne Archibald apparently wrote: storing actual python objects in an array is probably not a good idea I have been wondering what people use object arrays for. I have been guessing that it is for indexing convenience? Are there other core motivations? You can always define an object array of matrices, which solves Tim's problem of matrix stacks, albeit in not the most efficient manner and not the easiest thing to specify due to the current limitations of array(...). I do think it would be nice to have arrays or arrays, but this needs one more array type so that one can do something like array(list_of_arrays, dtype=matrix((2,3),float)), i.e., we could use a fancier dtype. I think if you're going to be writing code to do this, it would be better not to use object arrays. After all, there no reason the underlysing storage for an array of matrices shouldn't be one big block of memory rather than a lot of scattered python objects. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about optimizing
2008/5/17 Brian Blais [EMAIL PROTECTED]: at least for me, that was the motivation. I am trying to build a simulation framework for part of the brain, which requires connected layers of nodes. A layer is either a 1D or 2D structure of nodes, with each node a relatively complex beast. Rather than reinvent the indexing (1D, 2D, slicing, etc...), I just inherited from ndarray. I thought, after the fact, that some numpy functions on arrays would help speed up the code, which consists mostly of calling an update function on all nodes, passing each them an input vector. I wasn't sure if there would be any speed up for this, compared to for n in self.flat: n.update(input_vector) From the response, the answer seems to be no, and that I should stick with the python loops for clarity. But also, the words of Anne Archibald, makes me think that I have made a bad choice by inheriting from ndarray, although I am not sure what a convenient alternative would be. Well, it doesn't exist yet, but a handy tool would be a factory function ArrayOf; you would pass it a class, and it would produce a subclass of ndarray designed to contain that class. That is, the underlying storage would be a record array, but the getitem and setitem would automatically handle conversion to and from the class you supplied it, where appropriate. myarray = ArrayOf(Node,dtype=...) A = myarray.array([Node(...), Node(...), Node(...)]) n = A[1] A[2] = Node(...) A.C.update() # python-loop-based update of all elements You could also design it so that it was easy to derive a class from it, since that's probably the best way to handle vectorized methods: class myarray(ArrayOf(Node, dtype=...)): def update(self): self.underlying[node_attribute] += 1 I should say, if you can get away with treating your nodes more like C structures and writing (possibly vectorized) functions to act on them, you can avoid all this mumbo jumbo: node_dtype = [(node_attribute,np.int),(weight, np.float)] A = np.zeros(10,dtype=node_dtype) def nodes_update(A): A[node_attribute] += 1 Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Ruby benchmark -- numpy is slower.... was: Re: Ruby's NMatrix and NVector
2008/5/16 David Cournapeau [EMAIL PROTECTED]: Sebastian Haase wrote: Hi, can someone comment on these timing numbers ? http://narray.rubyforge.org/bench.html.en Is the current numpy faster ? It is hard to know without getting the same machine or having the benchmark sources. But except for add, all other operations rely on underlying blas/lapack (only matrix operations do if you have no cblas), so I am a bit surprised by the results. FWIW, doing 100 x c = a + b with 1e6 elements on a PIV prescott @ 3.2 Ghz is about 2 sec, and I count numpy start: import numpy as np a = np.random.randn(1e6) b = np.random.randn(1e6) for i in range(100): a + b And np.dot(a, b) for 3 iterations and 500x500 takes 0.5 seconds (again taking into account numpy import), but what you really do here is benchmarking your underlying BLAS (if numpy.dot does use BLAS, again, which it does at least when built with ATLAS). There are four benchmarks: add, multiply, dot, and solve. dot and solve use BLAS, and for them numpy ruby and octave are comparable. Add and multiply are much slower in numpy, but they are implemented in numpy itself. Exactly why add and multiply are slower is an interesting question - loop overhead? striding? cache behaviour? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.arccos(numpy.inf)????
2008/5/16 Stuart Brorson [EMAIL PROTECTED]: Hi -- Sorry to be a pest with corner cases, but I found another one. In this case, if you try to take the arccos of numpy.inf in the context of a complex array, you get a bogus return (IMO). Like this: In [147]: R = numpy.array([1, numpy.inf]) In [148]: numpy.arccos(R) Warning: invalid value encountered in arccos Out[148]: array([ 0., NaN]) In [149]: C = numpy.array([1+1j, numpy.inf]) In [150]: numpy.arccos(C) Warning: invalid value encountered in arccos Out[150]: array([ 0.90455689-1.06127506j, NaN -Infj]) The arccos(numpy.inf) in the context of a real array is OK, but taking arcocs(numpy.inf) in the context of a complex array should return NaN + NaNj, IMO. Thoughts? Is this so bad? lim x-inf arccos(x) = -j inf Granted this is only when x approaches infinity along the positive real axis... Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about optimizing
2008/5/16 Brian Blais [EMAIL PROTECTED]: I have a custom array, which contains custom objects (I give a stripped down example below), and I want to loop over all of the elements of the array and call a method of the object. I can do it like: a=MyArray((5,5),MyObject,10) for obj in a.flat: obj.update() but I was wondering if there is a faster way, especially if obj.update is a cython-ed function. I was thinking something like apply_along_axis, but without having an input array at all. Is there a better way to do this? While I'm asking, is there a better way to overload ndarray than what I am doing below? I tried to follow code I found online, but the examples of this are few and far between. Unfortunately, the loop overhead isn't very big compared to the overhead of method dispatch, so there's no way you're going to make things fast. For convenience you can do something like update = vectorize(lambda object: object.update()) and then later update(a) But if you really want things to go quickly, I think the best approach is to take advantage of the main feature of arrays: they hold homogeneous items efficiently. So use the array to store the contents of your object, and put any special behaviour in the array object. For example, if I wanted to efficiently compute with arrays of numbers carrying units, I would attach a unit to the array as a whole, and have the array store doubles. With some cleverness, you can even have accesses to the array return a freshly-created object whose contents are based on the values you looked up in the array. But storing actual python objects in an array is probably not a good idea. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] let's use patch review
2008/5/15 Francesc Alted [EMAIL PROTECTED]: I don't need to say that this procedure was not used for small or trivial changes (that were fixed directly), but only when the issue was important enough to deserve the attention of the mate. I think here's the rub: when I hear patch review system it sounds to me like an obstacle course for getting code into the software. Maybe it's justified, but I think at the moment there are many many things that are just awaiting a little bit of attention from someone who knows the code. A patch review system overapplied would multiply that number by rather a lot. How about a purely-optional patch review system? I've submitted patches I wanted reviewed before they went in the trunk. As it was, I didn't have SVN access, so I just posted them to trac or emailed them to somebody, who then pondered and committed them. But a patch review system - provided people were promptly reviewing patches - would have fit the bill nicely. How frequently does numpy receive patches that warrant review? The zillion little doc fixes don't, even moderate-sized patches from experienced developers probably don't warrant review. So in the last while, what are changes that needed review, and what happened to them? * financial code - email to the list, discussion, eventual inclusion * matrix change - bug filed, substantial discussion, confusion about consensus committed, further dicussion, consensus committed, users complain, patch backed out and issue placed on hold * MA change - developed independently, discussed on mailing list, committed * histogram change - filed as bug, discussed on mailing list, committed * median change - discussed on mailing list, committed * .npy file format - discussed and implemented at a sprint, committed Did I miss any major ones? Of course, svn log will give you a list of minor fixes in the last few months. It seems to me like the review process at the moment is just discuss it on the mailing list. Tools to facilitate that would be valuable; it would be handy to be able to point to a particular version of the code somewhere on the Web (rather than just in patches attached to email), for example. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Going toward time-based release ?
2008/5/12 Jarrod Millman [EMAIL PROTECTED]: On Mon, May 12, 2008 at 12:05 PM, Eric Firing [EMAIL PROTECTED] wrote: As one who pushed for the MA transition, I appreciate your suggestion. It may have one unintended consequence, however, that may cause more trouble than it saves: it may lead to more situations where the ma versions are unintentionally mixed. This will probably work if an old_ma array ends up as input for a new_ma function, but the reverse often will not work correctly. Good point. I now agree that it is best to avoid having the old code accessible via np.core.ma. If the old code is only accessible via np.oldnumeric, then I don't see any reason to add a deprecation warning since it is kind of implied by placing it in np.oldnumeric. Does it make sense to make the *new* code available in numpy.core.ma, optionally with a DeprecationWarning? I realize it was supposedly never advertised as being there, but given the state of numpy documentation many users will have found it there; at the least matplotlib did. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Writing new ufuncs
Hi, Suppose I have a C function, double logsum(double a, double b); What is needed to produce a ufunc object whose elementwise operation is done by calling this function? Also, is there a way to take a python function and automatically make a ufunc out of it? (No, vectorize doesn't implement reduce(), accumulate(), reduceat(), or outer().) Thanks, Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Writing new ufuncs
2008/5/11 Alan McIntyre [EMAIL PROTECTED]: On Sun, May 11, 2008 at 2:04 PM, Anne Archibald [EMAIL PROTECTED] wrote: Also, is there a way to take a python function and automatically make a ufunc out of it? (No, vectorize doesn't implement reduce(), accumulate(), reduceat(), or outer().) I've not used it, but have a look at numpy.frompyfunc; its docstring suggests it turns arbitrary Python functions into ufuncs. Ah, thanks. Shame it produces object arrays though. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Writing new ufuncs
2008/5/11 Robert Kern [EMAIL PROTECTED]: Basically, you need 3 arrays: functions implementing the type-specific inner loops, void* extra data to pass to these functions, and an array of arrays containing the type signatures of the ufunc. In numpy, we already have generic implementations of the loop functions for common combinations of types. In your case, for a binary function taking two doubles and returning a double, we have PyUFunc_dd_d(). As its extra void* data, it takes a function pointer that actually implements the element-wise operation. So lets start making the arrays: Great! Thanks! Is it possible to provide a specialized implementation of reduce()? (Since reduce() can be implemented more efficiently than doing it pairwise.) PyUFunc_None, // The identity element for reduction. // No good one to use for this function, // unfortunately. Is it safe to use minus infinity, or is this going to give people all kinds of warnings if they have seterr() set? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Writing new ufuncs
2008/5/11 Robert Kern [EMAIL PROTECTED]: Perhaps, but ufuncs only allow 0 or 1 for this value, currently. That's a shame, minus infinity is the identity for maximum too. Also, I was wrong about using PyUFunc_ff_f. Instead, use PyUFunc_ff_f_As_dd_d. Hmm. Well, I tried implementing logsum(), and it turns out neither PyUFunc_dd_d nor any of the other functions - mostly flagged as being part of the ufunc API and described in the automatically-generated ufunc_api.txt - are exported. All are static and not described in any header I could find. That said, I know scipy defines its own ufuncs, so it must either reimplement these or have some way I hadn't thought of to get at them. I've attached a patch to add the ufunc logsum to numpy. It's a bit nasty for several reasons: * I wasn't sure where to put it, so I put it in _compiled_base.c in numpy/lib/src. I was very hesitant to touch umathmodule.c.src with its sui generis macro language. * I'm not sure what to do about log1p - it seems to be available in spite of HAVE_LOG1P not being defined. In any case if it doesn't exist it seems crazy to implement it again here. * since PyUFunc_dd_d does not seem to be exported, I just cut-n-pasted its code here. Obviously not a solution, but what are extension writers supposed to do? Do we want a ufunc version of logsum() in numpy? I got the impression that in spite of some people's doubts as to its utility there was a strong contingent of users who do want it. This plus a logdot() would cover most of the operations one might want to do on numbers in log representation. (Maybe subtraction?). It could be written in pure python, for the most part, and reduce() would save a few exps and logs at the cost of a temporary, but accumulate() remains a challenge. (I don't expect many people feel the need for reduceat()). Anne P.S. it was surprisingly difficult to persuade numpy to find my tests. I suppose that's what the proposed transition to nose is for? -A Index: numpy/lib/src/_compiled_base.c === --- numpy/lib/src/_compiled_base.c (revision 5155) +++ numpy/lib/src/_compiled_base.c (working copy) @@ -1,6 +1,7 @@ #include Python.h #include structmember.h #include numpy/noprefix.h +#include numpy/ufuncobject.h static PyObject *ErrorObject; #define Py_Try(BOOLEAN) {if (!(BOOLEAN)) goto fail;} @@ -835,16 +836,68 @@ return; } +#if !defined(HAVE_LOG1P) 0 +/* logsum ufunc */ +static double log1p(double x) /* is this really necessary? */ +{ +double u = 1. + x; +if (u == 1.0) { +return x; +} else { +return log(u) * x / (u-1.); +} +} +#endif HAVE_LOG1P + + +static double logsum(double x, double y) { +if (xy) { +return x+log1p(exp(y-x)); +} else { +return y+log1p(exp(x-y)); +} +}; +static void +logsum_dd_d(char **args, intp *dimensions, intp *steps) +{ +intp n = dimensions[0]; +intp is1 = steps[0]; +intp is2 = steps[1]; +intp os = steps[2]; +char *ip1 = args[0]; +char *ip2 = args[1]; +char *op = args[2]; +intp i; + +for(i = 0; i n; i++, ip1 += is1, ip2 += is2, op += os) { +double *in1 = (double *)ip1; +double *in2 = (double *)ip2; +double *out = (double *)op; + +*out = logsum(*in1, *in2); +} +} + + +static char logsum_sigs[] = { + NPY_FLOAT64, NPY_FLOAT64, NPY_FLOAT64 +}; +static PyUFuncGenericFunction logsum_functions[] = {logsum_dd_d}; +static void* logsum_data[] = {logsum}; +static char logsum_doc[] = Compute log(exp(x)+exp(y)) without numeric overflows and underflows; + + /* Initialization function for the module (*must* be called initname) */ PyMODINIT_FUNC init_compiled_base(void) { -PyObject *m, *d, *s; +PyObject *m, *d, *s, *f; /* Create the module and add the functions */ m = Py_InitModule(_compiled_base, methods); /* Import the array objects */ import_array(); +import_umath(); /* Add some symbolic constants to the module */ d = PyModule_GetDict(m); @@ -857,6 +910,23 @@ PyDict_SetItemString(d, error, ErrorObject); Py_DECREF(ErrorObject); +/* set up logsum ufunc */ + f = PyUFunc_FromFuncAndData( + logsum_functions, + logsum_data, + logsum_sigs, + 1, // The number of type signatures. + 2, // The number of inputs. + 1, // The number of outputs. + PyUFunc_None, // The identity element for reduction. + // No good one to use for this function, + // unfortunately. + logsum, // The name of the ufunc. + logsum_doc, + 0 // Dummy for API backwards compatibility. + ); + PyDict_SetItemString(d, logsum, f); + Py_DECREF(f); /* define PyGetSetDescr_Type and PyMemberDescr_Type */ define_types(); Index: numpy/lib/tests/test_ufunclike.py
Re: [Numpy-discussion] Uncomfortable with matrix change
2008/5/10 Jarrod Millman [EMAIL PROTECTED]: On Sat, May 10, 2008 at 8:14 AM, Keith Goodman [EMAIL PROTECTED] wrote: If these are backed out, will some kind of deprecation warning be added for scalar indexing, as Travis suggested? Robert's request seems in accord with this. Shouldn't a deprecation warning explain what the future behavior will be? Is there a firm consensus on what that behavior will be? I agree that a deprecation warning needs to explain the future behavior and don't believe we have agreed on what it should be yet. I don't personally have an opinion on the what the new behavior should be at this point. But I don't think it makes sense to add deprecation warnings at this point--unless we know exactly what it is that we will be doing in the future. So while it could be argued that it would be useful to say use x[0,:] instead of x[0] to return row 0 as a matrix in the 1.1 release, my sense is that we will still need to replace that DeprecationWarning in 1.2 with a new DeprecationWarning saying something like in 1.3 x[0] will return and if you want you will need to call x[0,:]. I don't think we should have two different DeprecationWarnings in back to back releases and believe that we should wait to include the warning until we decide what the new behavior will be in 1.2. Regrettably I have to agree that we can't introduce a DeprecationWarning until 1.2. I don't agree that x[0,:] should return a matrix! It should be the one-dimensional object it looks like. See Keith Goodman's recent message - http://projects.scipy.org/pipermail/numpy-discussion/2008-May/033726.html - for an example of why representing 1D objects as 2D objects is trouble. I am of the opinion that matrix indexing should be identical to array indexing, insofar as that is possible. I don't expect my opinion to prevail, but the point is that we do not even have enough consensus to agree on a recommendation to go in the DeprecationWarning. Alas. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Uncomfortable with matrix change
2008/5/10 Nathan Bell [EMAIL PROTECTED]: On Sat, May 10, 2008 at 3:05 PM, Anne Archibald [EMAIL PROTECTED] wrote: I don't expect my opinion to prevail, but the point is that we do not even have enough consensus to agree on a recommendation to go in the DeprecationWarning. Alas. Would you object to raising a general Warning with a message like the following? matrix indexing of the form x[0] is ambiguous, consider the explicit format x[0,:] Well, since I at least have proposed changing the latter as well, the warning doesn't actually help with deprecation. It does reduce the amount of beginner foot-shooting (as in x[0][0]), I guess, so I'm not opposed to it. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Uncomfortable with matrix change
2008/5/10 Timothy Hochberg [EMAIL PROTECTED]: On Sat, May 10, 2008 at 1:37 PM, Anne Archibald [EMAIL PROTECTED] wrote: 2008/5/10 Nathan Bell [EMAIL PROTECTED]: On Sat, May 10, 2008 at 3:05 PM, Anne Archibald [EMAIL PROTECTED] wrote: I don't expect my opinion to prevail, but the point is that we do not even have enough consensus to agree on a recommendation to go in the DeprecationWarning. Alas. Would you object to raising a general Warning with a message like the following? matrix indexing of the form x[0] is ambiguous, consider the explicit format x[0,:] Well, since I at least have proposed changing the latter as well, the warning doesn't actually help with deprecation. It does reduce the amount of beginner foot-shooting (as in x[0][0]), I guess, so I'm not opposed to it. Please, let's just leave the current matrix class alone. Any change sufficient to make matrix not terrible, will break everyone's code. Instead, the goal should be build a new matrix class (say newmatrix) where we can start from scratch. We can build it and keep it alpha for a while so we can break the interface as needed while we get experience with it and then, maybe, transition matrix to oldmatrx and newmatrix to matrix over a couple of releases. Or just choose a reasonable name for matrix to begin with (ndmatrix?) and then deprecate the current matrix class when the new one is fully baked. Trying to slowly morph matrix into something usable is just going to break code again and again and just be a giant headache in general. +1 I think something along the lines of the Numerical Ruby code you pointed to some time ago is a good idea, that is, an array of matrices object and an array of vectors object. There will be considerable challenges - distinguishing the two kinds of index manipulations on arrays of matrices (those that transpose every matrix in the array, for eample, and those that rearrange the array of matrices). Since teaching is one of the main current applications of matrices, these should be presented in as non-painful a way as possible. Anyway, I think I am in favour of returning the 1.1 matrix behaviour to its 1.0 form and releasing the thing. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Uncomfortable with matrix change
2008/5/9 Travis Oliphant [EMAIL PROTECTED]: After Nathan Bell's recent complaints, I'm a bit more uncomfortable with the matrix change to scalar indexing. It does and will break code in possibly hard-to-track down ways. Also, Nathan has been a *huge* contributor to the Sparse matrix in scipy and so I value his opinion about the NumPy matrix. One of my goals is to have those two objects work together a bit more seamlessly. So, I think we need to: 1) Add a warning to scalar access 2) Back-out the change and fix all the places where NumPy assumes incorrectly that the number of dimensions reduce on PySequence_GetItem. Opinions? This is certainly the conservative approach. How much code is broken by this, compared to (say) the amount broken by the disappearance of numpy.core.ma? Is this our biggest single API breakage? I do agree that we should be paying attention to people who are actually using matrices, so I won't enter a vote. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in oldnumeric.ma
2008/5/9 Eric Firing [EMAIL PROTECTED]: Stefan, (and Jarrod and Pierre) (Context for anyone new to the thread: the subject is slightly misleading, because the bug is/was present in both oldnumeric.ma and numpy.ma; the discussion of fix pertains to the latter only.) Regarding your objections to r5137: good point. I wondered about that. I think the function should look like this (although it might be possible to speed up the implementation for the most common case): [...] md = make_mask((fb != fb.astype(int)) (fa 0), shrink=True) Unfortunately this isn't quite the right condition: In [18]: x = 2.**35; numpy.array([-1.])**x; numpy.array(x).astype(int)==x Out[18]: array([ 1.]) Out[18]: False Switching to int64 seems to help: In [27]: x = 2.**62; numpy.array([-1.])**x; numpy.array(x).astype(numpy.int64)==x Out[27]: array([ 1.]) Out[27]: True This seems to work, but may be platform-dependent: 2**62+1 cannot be represented as an IEEE float, so whether pow() successfully deals with it may be different for machines that don't work with 80-bit floating-point internally. A suspenders-and-belt approach would check for NaNs and add them to the mask, but this still doesn't cover the case where the user has numpy set to raise exceptions any time NaNs are generated. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Power domain (was Re: bug in oldnumeric.ma)
2008/5/9 Eric Firing [EMAIL PROTECTED]: It seems like some strategic re-thinking may be needed in the long run, if not immediately. There is a wide range of combinations of arguments that will trigger invalid results, whether Inf or NaN. The only way to trap and mask all of these is to use masked_invalid after the calculation, and this only works if the user has not disabled nan output. I have not checked recently, but based on earlier strategy discussions, I suspect that numpy.ma is already strongly depending on the availability of nan and inf output to prevent exceptions being raised upon invalid calculations. Maybe this should simply be considered a requirement for the use of ma. I think in principle the right answer is to simply run whatever underlying function, and mask any NaNs or Infs in the output. This may be a problem when it comes to seterr - is the behaviour of seterr() even defined in a multithreaded context? Can it be made thread-local? Surely hardware must support thread-local floating-point error flags. If seterr() works this way, then surely the right answer in ma is to use a try/finally block to turn off exceptions and warnings, and clean up NaNs after the fact. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Log Arrays
2008/5/8 David Cournapeau [EMAIL PROTECTED]: On Thu, May 8, 2008 at 10:20 PM, Charles R Harris [EMAIL PROTECTED] wrote: Floating point numbers are essentially logs to base 2, i.e., integer exponent and mantissa between 1 and 2. What does using the log buy you? Precision, of course. I am not sure I understand the notation base = 2, but doing computation in the so called log-domain is a must in many statistical computations. In particular, in machine learning with large datasets, it is common to have some points whose pdf is extremely small, and well below the precision of double. Typically, internally, the computation of my EM toolbox are done in the log domain, and use the logsumexp trick to compute likelihood given some data: I'm not sure I'd describe this as precision, exactly; it's an issue of numerical range. But yes, I've come across this while doing maximum-likelihood fitting, and a coworker ran into it doing Bayesian statistics. It definitely comes up. Is logarray really the way to handle it, though? it seems like you could probably get away with providing a logsum ufunc that did the right thing. I mean, what operations does one want to do on logarrays? add - logsum subtract - ? multiply - add mean - logsum/N median - median exponentiate to recover normal-space values - exp str - ? I suppose numerical integration is also valuable, so it would help to have a numerical integrator that was reasonably smart about working with logs. (Though really it's just a question of rescaling and exponentiating, I think.) A logarray type would help by keeping track of the fact that its contents were in log space, and would make expressions a little less cumbersome, I guess. How much effort would it take to write it so that it got all the corner cases right? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Log Arrays
2008/5/8 Charles R Harris [EMAIL PROTECTED]: What realistic probability is in the range exp(-1000) ? Well, I ran into it while doing a maximum-likelihood fit - my early guesses had exceedingly low probabilities, but I needed to know which way the probabilities were increasing. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Log Arrays
2008/5/8 Charles R Harris [EMAIL PROTECTED]: On Thu, May 8, 2008 at 10:56 AM, Robert Kern [EMAIL PROTECTED] wrote: When you're running an optimizer over a PDF, you will be stuck in the region of exp(-1000) for a substantial amount of time before you get to the peak. If you don't use the log representation, you will never get to the peak because all of the gradient information is lost to floating point error. You can consult any book on computational statistics for many more examples. This is a long-established best practice in statistics. But IEEE is already a log representation. You aren't gaining precision, you are gaining more bits in the exponent at the expense of fewer bits in the mantissa, i.e., less precision. As I say, it may be convenient, but if cpu cycles matter, it isn't efficient. Efficiency is not the point here. IEEE floats simply cannot represent the difference between exp(-1000) and exp(-1001). This difference can matter in many contexts. For example, suppose I have observed a source in X-rays and I want to fit a blackbody spectrum. I have, say, hundreds of spectral bins, with a number of photons in each. For any temperature T and amplitude A, I can compute the distribution of photons in each bin (Poisson with mean depending on T and A), and obtain the probability of obtaining the observed number of photons in each bin. Even if a blackbody with temperature T and amplitude A is a perfect fit I should expect this number to be very small, since the chance of obtaining *exactly* this sequence of integers is quite small. But when I start the process I need to guess a T and an A and evauate the probability. If my values are far off, the probability will almost certainly be lower than exp(-1000). But I need to know whether, if I increase T, this probability will increase or decrease, so I cannot afford to treat it as zero, or throw up my hands and say This is smaller than one over the number of baryons in the universe! My optimization problem doesn't make any sense!. I could also point out that frequently when one obtains these crazy numbers, one is not working with probabilities at all, but probability densities. A probability density of exp(-1000) means nothing special. Finally, the fact that *you* don't think this is a useful technique doesn't affect the fact that there is a substantial community of users who use it daily and who would like some support for it in scipy. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Log Arrays
2008/5/8 Charles R Harris [EMAIL PROTECTED]: David, what you are using is a log(log(x)) representation internally. IEEE is *not* linear, it is logarithmic. As Robert Kern says, yes, this is exactly what the OP and all the rest of us want. But it's a strange thing to say that IEEE is logarithmic - 2.3*10**1 is not exactly logarithmic, since the 2.3 is not the logarithm of anything. IEEE floats work the same way, which is important, since it means they can exactly represent integers of moderate size. For example, 257 is represented as sign 0, exponent 135, (implied leading 1).0001b. The exponent is indeed the integer part of the log base 2 of the value, up to some fiddling, but the mantissa is not a logarithm of any kind. Anyway, all this is immaterial. The point is, in spite of the fact that floating-point numbers can represent a very wide range of numbers, there are some important contexts in which this range is not wide enough. One could in principle store an additional power of two in an accompanying integer, but this would be less efficient in terms of space and time, and more cumbersome, when for the usual situations where this is applied, simply taking the logarithm works fine. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Log Arrays
2008/5/8 T J [EMAIL PROTECTED]: On 5/8/08, Anne Archibald [EMAIL PROTECTED] wrote: Is logarray really the way to handle it, though? it seems like you could probably get away with providing a logsum ufunc that did the right thing. I mean, what operations does one want to do on logarrays? add - logsum subtract - ? multiply - add mean - logsum/N median - median exponentiate to recover normal-space values - exp str - ? That's about it, as far as my usage goes. Additionally, I would also benefit from: logdot logouter In addition to the elementwise operations, it would be nice to have logsum along an axis logprod along an axis cumlogsum cumlogprod Whether these are through additional ufuncs or through a subclass is not so much of an issue for me---either would be a huge improvement to the current situation. One benefit of a subclass, IMO, is that it maintains the feel of non-log arrays. That is, when I want to multiply to logarrays, I just do x*y, rather than x+ybut I can understand arguments that this might not be desirable. Well, how about a two-step process? Let's come up with a nice implementation of each of the above, then we can discuss whether a subclass is warranted (and at that point each operation on the subclass will be very simple to implement). Most of them could be implemented almost for free by providing a ufunc that did logsum(); then logsum along an axis becomes logsum.reduce(), and cumlogsum becomes logsum.accumulate(). logprod is of course just add. logouter is conveniently and efficiently written as logprod.outer. logdot can be written in terms of logsum() and logprod() at the cost of a sizable temporary, but should really have its own implementation. It might be possible to test this by using vectorize() on a single-element version of logsum(). This would be slow but if vectorize() makes a real ufunc object should provide all the usual handy methods (reduce, accumulate, outer, etcetera). Alternatively, since logsum can be written in terms of existing ufuncs, it should be possible to implement a class providing all the ufunc methods by hand without too too much pain. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Why are ufunc docstrings useless?
Hi, I frequently use functions like np.add.reduce and np.add.outer, but their docstrings are totally uninformative. Would it be possible to provide proper docstrings for these ufunc methods? They need not be specific to np.add; just an explanation of what arguments to give (for example) reduce() (presumably it takes an axis= argument? what is the default behaviour?) and what it does would help. Thanks, Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Why are ufunc docstrings useless?
2008/5/8 Robert Kern [EMAIL PROTECTED]: On Thu, May 8, 2008 at 5:28 PM, Anne Archibald [EMAIL PROTECTED] wrote: Hi, I frequently use functions like np.add.reduce and np.add.outer, but their docstrings are totally uninformative. Would it be possible to provide proper docstrings for these ufunc methods? They need not be specific to np.add; just an explanation of what arguments to give (for example) reduce() (presumably it takes an axis= argument? what is the default behaviour?) and what it does would help. Sure. The place to add them would be in the PyMethodDef ufunc_methods array on line 3953 of numpy/core/src/ufuncobject.c. Just extend each of the rows with a char* containing the docstring contents. A literal string will suffice. Thanks! Done add, reduce, outer, and reduceat. What about __call__? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Why are ufunc docstrings useless?
2008/5/8 Robert Kern [EMAIL PROTECTED]: On Thu, May 8, 2008 at 9:52 PM, Anne Archibald [EMAIL PROTECTED] wrote: Thanks! Done add, reduce, outer, and reduceat. What about __call__? If anyone knows enough to explicitly request a docstring from __call__, they already know what it does. How exactly are they to find out? It does take additional arguments, for example dtype and out - I think. Also, help(np.add) displays the the object, its docstring, its methods, and all their docstrings. So it provides a way to get a docstring out of __call__ without having to know what it is. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Why are ufunc docstrings useless?
2008/5/8 Robert Kern [EMAIL PROTECTED]: On Thu, May 8, 2008 at 10:39 PM, Anne Archibald [EMAIL PROTECTED] wrote: 2008/5/8 Robert Kern [EMAIL PROTECTED]: If anyone knows enough to explicitly request a docstring from __call__, they already know what it does. How exactly are they to find out? It does take additional arguments, for example dtype and out - I think. That should be recorded in the ufunc's main docstring, e.g. numpy.add.__doc__, since that is what people will actually be calling. No one will explicitly call numpy.add.__call__(x,y). Also, help(np.add) displays the the object, its docstring, its methods, and all their docstrings. So it provides a way to get a docstring out of __call__ without having to know what it is. Meh. All it can usefully say is Refer to the main docstring. Which is more or less what it currently says. So is the custom that double-underscore methods get documented in the class docstring and normal methods get documented in their own docstrings? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Status Report for NumPy 1.1.0
2008/5/7 Jarrod Millman [EMAIL PROTECTED]: I have just created the 1.1.x branch: http://projects.scipy.org/scipy/numpy/changeset/5134 In about 24 hours I will tag the 1.1.0 release from the branch. At this point only critical bug fixes should be applied to the branch. The trunk is now open for 1.2 development. I committed Travis' matrix indexing patch (plus a basic test) to 1.1.x. Hope that's okay. (All tests pass.) Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in oldnumeric.ma
2008/5/7 Eric Firing [EMAIL PROTECTED]: Charles Doutriaux wrote: The following code works with numpy.ma but not numpy.oldnumeric.ma, No, this is a bug in numpy.ma also; power is broken: While it's tempting to just call power() and mask out any NaNs that result, that's going to be a problem if people have their environments set to raise exceptions on the production of NaNs. Is it an adequate criterion to check (a0) (round(b)==b)? We have to be careful: In [16]: np.array([-1.0])**(2.0**128) Warning: invalid value encountered in power Out[16]: array([ nan]) 2.0**128 cannot be distinguished from nearby non-integral values, so this is reasonable behaviour (and a weird corner case), but In [23]: np.round(2.0**128) == 2.0**128 Out[23]: True Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in oldnumeric.ma
2008/5/7 Pierre GM [EMAIL PROTECTED]: All, Yes, there is a problem with ma.power: masking negative data should be restricted to the case of an exponent between -1. and 1. only, don't you think ? No, there's a problem with any fractional exponent (with even denominator): x**(3/2) == (x**3)**(1/2). And of course in a floating-point world, you can't really ask whether the denominator is even or not. So any non-integer power is trouble. The draconian approach would be to simply disallow negative numbers to be raised to exponents of type float, but that's going to annoy a lot of people who have integers which happen to be represented in floats. (Representing integers in floats is exact up to some fairly large value.) So the first question is how do we recognize floats with integer values?. Unfortunately that's not the real problem. The real problem is how do we predict when power() is going to produce a NaN? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in oldnumeric.ma
2008/5/7 Jonathan Wright [EMAIL PROTECTED]: Is there a rule against squaring away the negatives? def not_your_normal_pow( x, y ): return exp( log( power( x, 2) ) * y / 2 ) Which still needs some work for x==0. Well, it means (-1.)**(3.) becomes 1., which is probably not what the user expected... Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Status Report for NumPy 1.1.0
2008/5/7 Charles R Harris [EMAIL PROTECTED]: On Wed, May 7, 2008 at 11:44 AM, Anne Archibald [EMAIL PROTECTED] wrote: 2008/5/7 Jarrod Millman [EMAIL PROTECTED]: I have just created the 1.1.x branch: http://projects.scipy.org/scipy/numpy/changeset/5134 In about 24 hours I will tag the 1.1.0 release from the branch. At this point only critical bug fixes should be applied to the branch. The trunk is now open for 1.2 development. I committed Travis' matrix indexing patch (plus a basic test) to 1.1.x. Hope that's okay. (All tests pass.) You should put it in the 1.2 branch also, then. I think fancy indexing and tolist, which Travis mentioned as having matrix workarounds, need to be checked to see if they still work correctly for matrices with the patch. Ah. Good point. I did find a bug - x[:,0] doesn't do what you'd expect. Best not release without either backing out my change. I'm still trying to track down what's up. Guess our test suite leaves something to be desired. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Status Report for NumPy 1.1.0
2008/5/7 Charles R Harris [EMAIL PROTECTED]: On Wed, May 7, 2008 at 5:31 PM, Anne Archibald [EMAIL PROTECTED] wrote: Ah. Good point. I did find a bug - x[:,0] doesn't do what you'd expect. Best not release without either backing out my change. I'm still trying to track down what's up. Returns a column matrix here, using 1.2.0.dev5143 after Travis's commits. What should it do? Oh, uh, my bad. I messed up my test code, and panicked thinking Jarrod was about to release a borken 1.1.0 and it was going to be All My Fault. That said, the only explicit test of .tolist() that I could find is the one I just wrote... Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Status Report for NumPy 1.1.0
2008/5/7 Charles R Harris [EMAIL PROTECTED]: Heh, I just added some tests to 1.2 before closing ticket #707. They should probably be merged with yours. Seems a shame: Ran 1000 tests in 5.329s Such a nice round number! Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Deprecating PyDataMem_RENEW ?
2008/5/5 David Cournapeau [EMAIL PROTECTED]: Basically, what I have in mind is, in a first step (for numpy 1.2): - define functions to allocate on a given alignement - make PyMemData_NEW 16 byte aligned by default (to be compatible with SSE and co). The problem was, and still is, realloc. It is not possible to implement realloc with malloc/free (in a portable way), and as such, it is not possible to have an aligned realloc. How much does this matter? I mean, what if you simply left all the reallocs as-is? The arrays that resulted from reallocs would not typically be aligned, but we cannot in any case expect all arrays to be aligned. Or, actually, depending on how you manage the alignment, it may be possible to write a portable aligned realloc. As I understand it, a portable aligned malloc allocates extra memory, then adds something to the pointer to make the memory arena aligned. free() must then recover the original pointer and call free() on it - though this can presumably be avoided if garbage collection is smart enough. realloc() also needs to recover the pointer to call the underlying realloc(). One answer is to ensure that at least one byte of padding is always done at the beginning, so that the number of pad bytes used on an aligned pointer p is accessible as ((uint8*)p)[-1]. Other schemes are available; I have a peculiar affection for the pure-python solution of constructing a view. In any of them, it seems to me, if free() can recover the original pointer, so can realloc()... I don't think any numpy function can assume that its input arrays are aligned, since users may like to pass, say, mmap()ed hunks of a file, over which numpy has no control of the alignment. In this case, then, how much of a problem is it if a few stray realloc()s produce non-aligned arrays? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Tests for empty arrays
2008/5/6 Andy Cheesman [EMAIL PROTECTED]: I was wondering if anyone could shed some light on how to distinguish an empty array of a given shape and an zeros array of the same dimensions. An empty array, that is, an array returned by the function empty(), just means an uninitialized array. The only difference between it and an array returned by zeros() is the values in the array: from zeros() they are, obviously, all zero, while from empty() they can be anything, including zero(). In practice they tend to be crazy numbers of order 1e300 or 1e-300, but one can't count on this. As a general rule, I recommend against using empty() on your first pass through the code. Only once your code is working and you have tests running should you consider switching to empty() for speed reasons. In fact, if you want to use empty() down the road, it may make sense to initialize your array to zeros()/0., so that if you ever use the values, the NaNs will propagate and become obvious. Unfortunately, some CPUs implement NaNs in software, so there can be a huge speed hit. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Status Report for NumPy 1.1.0
2008/5/6 Charles R Harris [EMAIL PROTECTED]: On Tue, May 6, 2008 at 6:40 AM, Alan G Isaac [EMAIL PROTECTED] wrote: On Tue, 6 May 2008, Jarrod Millman apparently wrote: open tickets that I would like everyone to take a brief look at: http://projects.scipy.org/scipy/numpy/ticket/760 My understanding is that my patch, which would give a deprecation warning, was rejected in favor of the patch specified at the end of Travis's message here: URL:http://projects.scipy.org/pipermail/numpy-discussion/2008-April/033315.html At least that's how I thought things were resolved. I expected Travis's patch would be part of the 1.1 release. I think someone needs to step up and make the change, but first it needs to be blessed by Travis and some of the folks on the Matrix indexing thread. The change might also entail removing some workarounds in the spots indicated by Travis. It has my vote, as a peripheral participant in the matrix indexing thread; in fact it's what some of us thought we were agreeing to earlier. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] lexsort
2008/5/6 Eleanor [EMAIL PROTECTED]: a = numpy.array([[1,2,6], [2,2,8], [2,1,7],[1,1,5]]) a array([[1, 2, 6], [2, 2, 8], [2, 1, 7], [1, 1, 5]]) indices = numpy.lexsort(a.T) a.T.take(indices,axis=-1).T array([[1, 1, 5], [1, 2, 6], [2, 1, 7], [2, 2, 8]]) The above does what I want, equivalent to sorting on column A then column B in Excel, but al the transposes are ungainly. I've stared at it a while but can't come up with a more elegant solution. Any ideas? It appears that lexsort is broken in several ways, and its docstring is misleading. First of all, this code is not doing quite what you describe. The primary key here is the [5,6,7,8] column, followed by the middle and then by the first. This is almost exactly the opposite of what you describe (and of what I expected). To get this to sort the way you describe, the clearest way is to write a sequence: In [34]: indices = np.lexsort( (a[:,1],a[:,0]) ) In [35]: a[indices,:] Out[35]: array([[1, 1, 5], [1, 2, 6], [2, 1, 7], [2, 2, 8]]) In other words,sort over a[:,1], then sort again over a[:,0], making a[:,0] the primary key. I have used fancy indexing to pull the array into the right order, but it can also be done with take(): In [40]: np.take(a,indices,axis=0) Out[40]: array([[1, 1, 5], [1, 2, 6], [2, 1, 7], [2, 2, 8]]) As for why I say lexsort() is broken, well, it simply returns 0 for higher-rank arrays (rather than either sorting or raising an exception), it raises an exception when handed axis=None rather than flattening as the docstring claims, and whatever the axis argument is supposed to do, it doesn't seem to do it: In [44]: np.lexsort(a,axis=0) Out[44]: array([1, 0, 2]) In [45]: np.lexsort(a,axis=-1) Out[45]: array([1, 0, 2]) In [46]: np.lexsort(a,axis=1) --- type 'exceptions.ValueError'Traceback (most recent call last) /home/peridot/ipython console in module() type 'exceptions.ValueError': axis(=1) out of bounds Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Deprecating PyDataMem_RENEW ?
2008/5/5 Robert Kern [EMAIL PROTECTED]: On Mon, May 5, 2008 at 7:44 AM, David Cournapeau [EMAIL PROTECTED] wrote: In numpy, we can always replace realloc by malloc/free, because we know the size of the old block: would deprecating PyMemData_RENEW and replacing them by PyMemeData_NEW/PyMemData_FREE be possible, such as to make all numpy arrays follow a default alignement ? There are only a few of them in numpy (6 of them), 0 in scipy, and I guess extensions never really used them ? I am in favor of at least trying this out. We will have to have a set of benchmarks to make sure we haven't hurt the current uses of PyMemData_RENEW which Tim points out. realloc() may not be a performance win anyway. Allocating new memory is quite fast, copying data is quite fast, and in-place realloc() tends to cause memory fragmentation - take an arena full of 1024-byte blocks and suddenly make one of them 256 bytes long and you haven't gained anything at all in most malloc() implementations. So yes, benchmarks. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Broadcasting question
2008/5/2 Chris.Barker [EMAIL PROTECTED]: Hi all, I have a n X m X 3 array, and I a n X M array. I want to assign the values in the n X m to all three of the slices in the bigger array: A1 = np.zeros((5,4,3)) A2 = np.ones((5,4)) A1[:,:,0] = A2 A1[:,:,1] = A2 A1[:,:,2] = A2 However,it seems I should be able to broadcast that, so I don't have to repeat myself, but I can't figure out how. All you need is a newaxis on A2: In [2]: A = np.zeros((3,4)) In [3]: B = np.ones(3) In [4]: A[:,:] = B[:,np.newaxis] In [5]: A Out[5]: array([[ 1., 1., 1., 1.], [ 1., 1., 1., 1.], [ 1., 1., 1., 1.]]) Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combining Sigmoid Curves
2008/5/2 Rich Shepard [EMAIL PROTECTED]: What will work (I call it a pi curve) is a matched pair of sigmoid curves, the ascending curve on the left and the descending curve on the right. Using the Boltzmann function for these I can calculate and plot each individually, but I'm having difficulty in appending the x and y values across the entire range. This is where I would appreciate your assistance. It's better not to work point-by-point, appending things, when working with numpy. Ideally you could find a formula which just produced the right curve, and then you'd apply it to the input vector and get the output vector all at once. For example for a Gaussian (which you're not using, I know), you'd write a function something like def f(x): return np.exp(-x**2) and then call it like: f(np.linspace(-1,1,1000)). This is efficient and clear. It does not seem to me that the logistic function, 1/(1+np.exp(x)) does quite what you want. How about using the cosine? def f(left, right, x): scaled_x = (x-(right+left)/2)/((right-left)/2) return (1+np.cos((np.pi/2) * scaled_x))/2 exactly zero at both endpoints, exactly one at the midpoint, inflection points midway between, where the value is 1/2. If you want to normalize it so that the area underneath is one, that's easy to do. More generally, the trick of producing a scaled_x as above lets you move any function anywhere you like. If you want the peak not to be at the midpoint of the interval, you will need to do something a litle more clever, perhaps choosing a different function, scaling the x values with a quadratic polynomial so that (left, middle, right) becomes (-1,0,1), or using a piecewise function. Good luck, Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combining Sigmoid Curves
2008/5/2 Rich Shepard [EMAIL PROTECTED]: On Fri, 2 May 2008, Anne Archibald wrote: It's better not to work point-by-point, appending things, when working with numpy. Ideally you could find a formula which just produced the right curve, and then you'd apply it to the input vector and get the output vector all at once. Anne, That's been my goal. :-) How about using the cosine? def f(left, right, x): scaled_x = (x-(right+left)/2)/((right-left)/2) return (1+np.cos((np.pi/2) * scaled_x))/2 exactly zero at both endpoints, exactly one at the midpoint, inflection points midway between, where the value is 1/2. If you want to normalize it so that the area underneath is one, that's easy to do. More generally, the trick of producing a scaled_x as above lets you move any function anywhere you like. This looks like a pragmatic solution. When I print scaled_x (using left = 0.0 and right = 100.0), the values range from -1.0 to +0.998. So, I need to figure out the scale_x that sets the end points at 0 and 100. No, no. You *want* scaled_x to range from -1 to 1. (The 0.998 is because you didn't include the endpoint, 100.) The function I gave, (1+np.cos((np.pi/2) * scaled_x))/2, takes [-1, 1] to a nice bump-shaped function. If you feed in numbers from 0 to 100 as x, they get transformed to scaled_x, and you feed them to the function, getting a result that goes from 0 at x=0 to 1 at x=50 to 0 at x=100. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] should outer take an output argument?
2008/4/29 Alan G Isaac [EMAIL PROTECTED]: As I was looking at Bill's conjugate gradient posting, I found myself wondering if there would be a payoff to an output argument for ``numpy.outer``. (It is fairly natural to repeatedly recreate the outer product of the adjusted residuals, which is only needed during a single iteration.) I avoid np.outer, as it seems designed for foot shooting. (It forcibly flattens both arguments into vectors no matter what shape they are and then computes their outer product.) np.multiply.outer does (what I consider to be) the right thing. Not that it takes an output argument either, as far as I can tell. But trying to inspect it suggests a few questions: * Why are ufunc docstrings totally useless? * Why are output arguments not keyword arguments? As for your original question, yes, I think it would be good if .outer() and .reduce() methods of ufuncs took output arguments. This would let one use add.reduce() as a poor man's sum() even when dealing with uint8s, for example. (Coming up with a problem for which subtract.reduce(), divide.reduce(), or arctan2.reduce() are the solutions is left as an exercise for the reader.) I can definitely see a use for divide.outer() : Int - Int - Float, though. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] very simple iteration question.
2008/4/30 a g [EMAIL PROTECTED]: Hi. This is a very basic question, sorry if it's irritating. If i didn't find the answer written already somewhere on the site, please point me to it. That'd be great. OK: how do i iterate over an axis other than 0? I have a 3D array of data[year, week, location]. I want to iterate over each year at each location and run a series of stats on the columns (on the 52 weeks in a particular year at a particular location). 'for years in data:' will get the first one, but then how do i not iterate over the 1 axis and iterate over the 2 axis instead? Well, there's always for i in xrange(A.shape[1]): do_something(A[:,i,...]) But that's kind of ugly in this day and age. I would be tempted by for a in np.rollaxis(A,1): do_something(a) It's worth mentioning that traversing an array along an axis not the first usually results in subarrays that are not contiguous in memory. While numpy makes working with these convenient, they may not be very efficient for cache reasons (for example, modern processors load from memory - an extremely slow operation - in blocks that may be as large as 64 bytes; if you only use eight of these before moving on, your code will use much more memory bandwidth than it needs to). If this is a concern, you can usually transparently rearrange your array's memory layout to avoid it. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] very simple iteration question.
2008/4/30 Christopher Barker [EMAIL PROTECTED]: a g wrote: OK: how do i iterate over an axis other than 0? This ties in nicely with some of the discussion about interating over matrices. It ahs been suggested that it would be nice to have iterators for matrices, so you could do: for row in M.rows: ... and for column in M.cols: ... If so, then wouldn't it make sense to have built in iterators for nd-arrays as well? something like: for subarray in A.iter(axis=i): ... where axis would default to 0. This is certainly cleaner than: for j in range(A.shape[i]): subarray = A[:,:,j,:] Wait! I have no idea how to spell that generically -- i.e. the ith index, where i is a variable. There must be a way to build a slice object dynamically, but I don't know it. Slices can be built without too much trouble using slice(), but it's much easier to just write for subarray in np.rollaxis(A,i): ... rollaxis() just pulls the specified axis to the front. (It doesn't do what I thought it would, which is a cyclic permutation of the axes). Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] untenable matrix behavior in SVN
2008/4/30 Charles R Harris [EMAIL PROTECTED]: Some operations on stacks of small matrices are easy to get, for instance, +,-,*,/, and matrix multiply. The last is the interesting one. If A and B are stacks of matrices with the same number of dimensions with the matrices stored in the last two indices, then sum(A[...,:,:,newaxis]*B[...,newaxis,:,:], axis=-2) is the matrix-wise multiplication of the two stacks. If B is replaced by a stack of 1D vectors, x, it is even simpler: sum(A[...,:,:]*x[...,newaxis,:], axis=-1) This doesn't go through BLAS, but for large stacks of small matrices it might be even faster than BLAS because BLAS is kinda slow for small matrices. Yes and no. For the first operation, you have to create a temporary that is larger than either of the two input arrays. These invisible (potentially) gigantic temporaries are the sort of thing that puzzle users when as their problem size grows they suddenly find they hit a massive slowdown because it starts swapping to disk, and then a failure because the temporary can't be allocated. This is one reason we have dot() and tensordot() even though they can be expressed like this. (The other is of course that it lets us use optimized BLAS.) This rather misses the point of Timothy Hochberg's suggestion (as I understood it), though: yes, you can write the basic operations in numpy, in a more or less efficient fashion. But it would be very valuable for arrays to have some kind of metadata that let them keep track of which dimensions represented simple array storage and which represented components of a linear algebra object. Such metadata could make it possible to use, say, dot() as if it were a binary ufunc taking two matrices. That is, you could feed it two arrays of matrices, which it would broadcast to the same shape if necessary, and then it would compute the elementwise matrix product. The question I have is, what is the right mathematical model for describing these arrays-some-of-whose-dimensions-represent-linear-algebra-objects? One idea is for each dimension to be flagged as one of replication, vector, or covector. A column vector might then be a rank-1 vector array, a row vector might be a rank-1 covector array, a linear operator might be a rank-2 object with one covector and one vector dimension, a bilinear form might be a rank-2 object with two covector dimensions. Dimensions designed for holding repetitions would be flagged as such, so that (for example) an image might be an array of shape (N,M,3) of types (replication,replication,vector); then to apply a color-conversion matrix one would simply use dot() (or * I suppose). without too much concern for which index was which. The problem is, while this formalism sounds good to me, with a background in differential geometry, if you only ever work in spaces with a canonical metric, the distinction between vector and covector may seem peculiar and be unhelpful. Implementing such a thing need not be too difficult: start with a new subclass of ndarray which keeps a tuple of dimension types. Come up with an adequate set of operations on them, and implement them in terms of numpy's functions, taking advantage of the extra information about each axis. A few operations that spring to mind: * Addition: it doesn't make sense to add vectors and covectors; raise an exception. Otherwise addition is always elementwise anyway. (How hard should addition work to match up corresponding dimensions?) * Multiplication: elementwise across repetition axes, it combines vector axes with corresponding covector axes to get some kind of generalized matrix product. (How is corresponding defined?) * Division: mostly doesn't make sense unless you have an array of scalars (I suppose it could compute matrix inverses?) * Exponentiation: very limited (though I suppose matrix powers could be implemented if the shapes are right) * Change of basis: this one is tricky because not all dimensions need come from the same vector space * Broadcasting: the rules may become a bit byzantine... * Dimension metadata fiddling Is this a useful abstraction? It seems like one might run into trouble when dealing with arrays whose dimensions represent vectors from unrelated spaces. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] recarray fun
2008/5/1 Travis E. Oliphant [EMAIL PROTECTED]: Stéfan van der Walt wrote: 2008/4/30 Christopher Barker [EMAIL PROTECTED]: Stéfan van der Walt wrote: That's the way, or just rgba_image.view(numpy.int32). ah -- interestingly, I tried: rgba_image.view(dtype=numpy.int32) and got: Traceback (most recent call last): File stdin, line 1, in module TypeError: view() takes no keyword arguments Since it is optional, shouldn't it be keyword argument? Thanks, fixed in r5115. This was too hasty. I had considered this before. The problem with this is that the object can be either a type object or a data-type object. You can use view to both re-cast a numpy array as another subtype or as another data-type. So, please revert the change until a better solution is posted. If we're going to support keyword arguments, shouldn't those two options be *different* keywords (dtype and ndarray_subclass, say)? Then a single non-keyword argument tells numpy to guess which one you wanted... Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] untenable matrix behavior in SVN
On 29/04/2008, Travis E. Oliphant [EMAIL PROTECTED] wrote: I'm quite persuaded now that a[i] should return a 1-d object for arrays.In addition to the places Chuck identified, there are at least 2 other places where special code was written to work-around the expectation that item selection returns an object with reduced dimensionality (a special-cased .tolist for matrices and a special-cased getitem in the fancy indexing code). As the number of special-case work-arounds grows the more I'm convinced the conceptualization is wrong. So, I now believe we should change the a[i] for matrices to return a 1-d array. The only down-side I see is that a[i] != a[i,:] for matrices. However, matrix(a[i]) == a[i,:], and so I'm not sure there is really a problem, there. I also don't think that whatever problem may actually be buried in the fact that type(a[i]) != type(a[i,:]) is worse than the problem that several pieces of NumPy code actually expect hierarchical container behavior of multi-dimensional sequences. I am puzzled by this. What is the rationale for x[i,:] not being a 1-d object? Does this not require many special-case bits of code as well? What about a[i,...]? That is what I would use to make a hierarchical bit of code, and I would be startled to find myself in an infinite loop waiting for the dimension to become one. I don't think making the small change to have a[i] return 1-d arrays precludes us from that 1-d array being a bit more formalized in 1.2 as a RowVector should we choose that direction. It does, however, fix several real bugs right now. +1 What's more, I think we need to have a matrix-cleanliness test suite that verifies that all basic numpy tools behave identically on matrices and preserve matrixiness. But that's a big job, and for 1.2. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] untenable matrix behavior in SVN
On 29/04/2008, Keith Goodman [EMAIL PROTECTED] wrote: On Tue, Apr 29, 2008 at 11:21 AM, Alan G Isaac [EMAIL PROTECTED] wrote: On Tue, 29 Apr 2008, Keith Goodman apparently wrote: I often use x[i,:] and x[:,i] where x is a matrix and i is a scalar. I hope this continues to return a matrix. 1. Could you give an example of the circumstances of this use? In my use i is most commonly an array (i = M.where(y.A)[0] where y is a nx1 matrix), sometimes a list, and in ipython when debugging or first writing the code, a scalar. It would seem odd to me if x[i,:] returned different types of objects based on the type of i: array index idx = M.where(y.A)[0] where y is a nx1 matrix x[dx,:] -- matrix list index idx = [0] x[idx,:] -- matrix? scalar index idx = 0 x[idx,:] -- not matrix It is actually pretty unreasonable to hope that A[0] and A[[1,2,3]] or A[[True,False,True]] should return objects of the same rank. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] accuracy issues with numpy arrays?
On 30/04/2008, eli bressert [EMAIL PROTECTED] wrote: I'm writing a quick script to import a fits (astronomy) image that has very low values for each pixel. Mostly on the order of 10^-9. I have written a python script that attempts to take low values and put them in integer format. I basically do this by taking the mean of the 1000 lowest pixel values, excluding zeros, and dividing the rest of the image by that mean. Unfortunately, when I try this in practice, *all* of the values in the image are being treated as zeros. But, if I use a scipy.ndimage function, I get proper values. For example, I take the pixel that I know has the highest value and do I think the bug is something else: import pyfits as p import scipy as s import scipy.ndimage as nd import numpy as n def flux2int(name): d = p.getdata(name) x,y = n.shape(d) l = x*y arr1= n.array(d.reshape(x*y,1)) temp = n.unique(arr1[0]) # This is where the bug starts. All values are treated as zeros. Hence only one value remains, zero. Actually, since arr1 has shape (x*y, 1), arr1[0] has shape (1,), and so it has only one entry: In [82]: A = np.eye(3) In [83]: A.reshape(9,1) Out[83]: array([[ 1.], [ 0.], [ 0.], [ 0.], [ 1.], [ 0.], [ 0.], [ 0.], [ 1.]]) In [84]: A.reshape(9,1)[0] Out[84]: array([ 1.]) The python debugger is a good way to check this sort of thing out; if you're using ipython, typing %pdb will start the debugger when an exception is raised, at which point you can poke around in all your local variables and evaluate expressions. arr1= arr1.sort() arr1= n.array(arr1) arr1= n.array(arr1[s.where(arr1 = temp)]) val = n.mean(arr1[0:1000]) d = d*(1.0/val) d = d.round() p.writeto(name[0,]+'fixed.fits',d,h) Good luck, Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] dot/tensordot limitations
Timothy Hochberg has proposed a generalization of the matrix mechanism to support manipulating arrays of linear algebra objects. For example, one might have an array of matrices one wants to apply to an array of vectors, to yield an array of vectors: In [88]: A = np.repeat(np.eye(3)[np.newaxis,...],2,axis=0) In [89]: A Out[89]: array([[[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.]], [[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.]]]) In [90]: V = np.array([[1,0,0],[0,1,0]]) Currently, it is very clumsy to handle this kind of situation even with arrays, keeping track of dimensions by hand. For example if one wants to multiply A by V elementwise, one cannot simply use dot: In [92]: np.dot(A,V.T) Out[92]: array([[[ 1., 0.], [ 0., 1.], [ 0., 0.]], [[ 1., 0.], [ 0., 1.], [ 0., 0.]]]) tensordot() suffers from the same problem. It arises because when you combine two multiindexed objects there are a number of ways you can do it: A: Treat all indices as different and form all pairwise products (outer product): In [93]: np.multiply.outer(A,V).shape Out[93]: (2, 3, 3, 2, 3) B: Contract the outer product on a pair of indices: In [98]: np.tensordot(A,V,axes=(-1,-1)).shape Out[98]: (2, 3, 2) C: Take the diagonal along a pair of indices: In [111]: np.diagonal(np.multiply.outer(A,V),axis1=0,axis2=3).shape Out[111]: (3, 3, 3, 2) What we want in this case is a combination of B and C: In [106]: np.diagonal(np.tensordot(A,V,axes=(-1,-1)),axis1=0,axis2=2).T.shape Out[106]: (2, 3) but it cannot be done without constructing a larger array and pulling out its diagonal. If this seems like an exotic thing to want to do, suppose instead we have two arrays of vectors, and we want to evaluate the array of dot products. None of no.dot, np.tensordot, or np.inner produce the results you want. You have to multiply elementwise and sum. (This also precludes automatic use of BLAS, if indeed tensordot uses BLAS.) Does it make sense to implement a generalized tensordot that can do this, or is it the sort of thing one should code by hand every time? Is there any way we can make it easier to do simple common operations like take the elementwise matrix-vector product of A and V? The more general issue, of making linear algebra natural by keeping track of which indices are for elementwise operations, and which should be used for dots (and how), is a whole other kettle of fish. I think for that someone should think hard about writing a full-featured multilinear algebra package (might as well keep track of coordinate transformations too while one was at it) if this is intended. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] untenable matrix behavior in SVN
On 30/04/2008, Keith Goodman [EMAIL PROTECTED] wrote: On Tue, Apr 29, 2008 at 2:18 PM, Anne Archibald [EMAIL PROTECTED] wrote: It is actually pretty unreasonable to hope that A[0] and A[[1,2,3]] or A[[True,False,True]] should return objects of the same rank. Why it unreasonable to hope that x[0,:] and x[0, [1,2,3]] or x[0, [True,False,True]] where x is a matrix, continue to return matrices? Well, I will point out that your example is somewhat different from mine; nobody is arguing that your three examples should return objects of different ranks. There is some disagreement over whether x[0,:] should be a rank-1 or a rank-2 object, but x[0,[1,2,3]] and x[0, [True, False, True]] should all have the same rank as x[0,:]. Nobody's questioning that. What I was pointing out is that x[0,0] should not have the same rank as x[0,:] or x[0,[0]]. In this case it's obvious; x[0,0] should be a scalar. But the same logic applies to x[0,:] versus x[[0,1],:] or even x[[0],:]. For arrays, if x has rank 2, x[0,:] has rank 1. if L is a list of indices, x[L,:] always has rank 2, no matter what is actually in L (even if it's one element). It is perfectly reasonable that they should yield objects of different ranks. With matrices, we have the surprising result that x[0,:] is still a two-dimensional object; to get at its third element I have to specify x[0,:][0,2]. It seems to me that this is a peculiar hack designed to ensure that the result still has * defined in a matrix way. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy release
On 25/04/2008, Stéfan van der Walt [EMAIL PROTECTED] wrote: 2008/4/25 Alan G Isaac [EMAIL PROTECTED]: I must have misunderstood: I thought the agreement was to provisionally return a 1d array for x[0], while we hashed through the other proposals. The agreement was: a) That x[0][0] should be equal to x[0,0] and b) That x[0,:] should be equal to x[0] (as for ndarrays) This breaks as little functionality as possible, at the cost of one (instead of two) API changes. Hold on. There has definitely been some confusion here. This is not what I thought I was suggesting, or what Alan thought he was suggesting. I do not think special-casing matrices for which one dimension happens to be one is a good idea at all, even temporarily. This is the kind of thing that drives users crazy. My suggested stopgap fix was to make x[0] return a 1D *array*; I feel that this will result in less special-casing. In fact I wasn't aware that anyone had proposed the fix you implemented. Can we change the stopgap solution? We should now discuss the proposals on the table, choose a good one, and implement all the API changes necessary for 1.2 or 2. It's a pity we have to change the API again, but the current situation is not tenable. Yes, well, it really looks unlikely we will be able to agree on what the correct solution is before 1.1, so I would like to have something non-broken for that release. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Generating Bell Curves (was: Using normal() )
On 24/04/2008, Rich Shepard [EMAIL PROTECTED] wrote: Thanks to several of you I produced test code using the normal density function, and it does not do what we need. Neither does the Gaussian function using fwhm that I've tried. The latter comes closer, but the ends do not reach y=0 when the inflection point is y=0.5. So, let me ask the collective expertise here how to generate the curves that we need. We need to generate bell-shaped curves given a midpoint, width (where y=0) and inflection point (by default, y=0.5) where y is [0.0, 1.0], and x is usually [0, 100], but can vary. Using the NumPy arange() function to produce the x values (e.g, arange(0, 100, 0.1)), I need a function that will produce the associated y values for a bell-shaped curve. These curves represent the membership functions for fuzzy term sets, and generally adjacent curves overlap where y=0.5. It would be a bonus to be able to adjust the skew and kurtosis of the curves, but the controlling data would be the center/midpoint and width, with defaults for inflection point, and other parameters. I've been searching for quite some time without finding a solution that works as we need it to work. First I should say, please don't call these bell curves! It is confusing people, since that usually means specifically a Gaussian. In fact it seems that you want something more usually called a sigmoid, or just a curve with a particular shape. I would look at http://en.wikipedia.org/wiki/Sigmoid_function In particular, they point out that the integral of any smooth, positive, bump-shaped function will be a sigmoid. So dreaming up an appropriate bump-shaped function is one way to go. Alternatively, tou can look at polynomial fitting - if you want, say, a function that is 1 with derivative zero at 0, 0.5 with derivative -1 at x, and 0 with derivative 0 at 1, you can construct a unique degree-6 polynomial that does exactly that; there's a new tool, KroghInterpolator, in scipy svn that can do that for you. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] untenable matrix behavior in SVN
On 25/04/2008, Charles R Harris [EMAIL PROTECTED] wrote: On Fri, Apr 25, 2008 at 12:02 PM, Alan G Isaac [EMAIL PROTECTED] wrote: I think we have discovered that there is a basic conflict between two behaviors: x[0] == x[0,:] vs. x[0][0] == x[0,0] To my recollection, everyone has agree that the second behavior is desirable as a *basic expectation* about the behavior of 2d objects. This implies that *eventually* we will have ``x[0][0] == x[0,0]``. But then eventually we MUST eventually have x[0] != x[0,:]. Choices: 1) x[0] equiv x[0:1,:] -- current behavior 2) x[0] equiv x[0,:] -- array behavior, gives x[0][0] == x[0,0] These are incompatible. I think 1) is more convenient in the matrix domain and should be kept, while 2) should be given up. What is the reason for wanting 2)? Maybe we could overload the () operator so that x(0) gives 2)? Or vice versa. We are currently operating in a fog of abstraction, trying to decide which operations are more natural or more in correspondence with our imagined idea of a user. We're not getting anywhere. Let's start writing up (perhaps on the matrix indexing wiki page) plausible use cases for scalar indexing, and how they would look under each proposal. For example: * Iterating over the rows of a matrix: # array-return proposal; x is a scalar for row in M: if np.all(row==0): raise ValueError # exception-raising proposal for i in xrange(M.shape[0]): if np.all(M[i,:]==0): raise ValueError Other applications? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Movement of ma breaks matplotlib
Hi, In the upcoming release of numpy. numpy.core.ma ceases to exist. One must use numpy.ma (for the new interface) or numpy.oldnumeric.ma (for the old interface). This has the unfortunate effect of breaking matplotlib(which does from numpy.core.ma import *) - I cannot even import pylab with a stock matplotlib (0.90.1) and an SVN numpy. Is this an intended effect? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy release
On 23/04/2008, Alan Isaac [EMAIL PROTECTED] wrote: On Wed, 23 Apr 2008, Sebastian Haase wrote: What used to be referred to a the 1.1 version, that can break more stuff, to allow for a cleaner design, will now be 1.2 So ... fixing x[0][0] for matrices should wait until 1.2. Is that correct? It seems to me that everyone agrees that the current situation is broken, but there is considerable disagreement on what the correct fix is. My inclination is to apply the minimal fix you suggest, with tests and documentation, as a stopgap, and let the different factions sort it out by discussion on the way to 1.2. Objections? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] aligned matrix / ctypes
On 23/04/2008, Zachary Pincus [EMAIL PROTECTED] wrote: Hi, Thanks a ton for the advice, Robert! Taking an array slice (instead of trying to set up the strides, etc. myself) is a slick way of getting this result indeed. It's worth mentioning that there was some discussion of adding an allocator for aligned arrays. It got sidetracked into a discussion of SSE support for numpy, but there are all sorts of reasons to want aligned arrays in numpy, as this post demonstrates. Seeing as it's just a few lines worth of pure python, is anyone interested in writing an aligned_empty() for numpy? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Truth value of an array
On 18/04/2008, Robert Kern [EMAIL PROTECTED] wrote: On Fri, Apr 18, 2008 at 3:33 PM, Joe Harrington [EMAIL PROTECTED] wrote: For that matter, is there a reason logical operations don't work on arrays other than booleans? What about: The keywords and, or, and not only work on bool objects; Python tries to convert the operands using bool(). bool(some_array) raises the exception just as we've been discussing. This is a bit misleading, as the actual value is returned: In [167]: or A Out[167]: 'A' In [168]: A and B Out[168]: 'B' In fact the problem is that and and or are short-circuiting, which means that return X or Y is equivalent to something like: if X: return X elif Y: return Y else: return False These are handled specially by syntax so that, for example, you can do False and 1/0 and not raise a ZeroDivisionError. So and and or aren't even really operators, and it's not just impossible to change them but unlikely to ever become possible. Just cast your arrays to booleans if you want to do boolean operations on them. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion