Re: [Numpy-discussion] Status Report for NumPy 1.1.0
2008/5/7 Jarrod Millman <[EMAIL PROTECTED]>: > I have just created the 1.1.x branch: > http://projects.scipy.org/scipy/numpy/changeset/5134 > In about 24 hours I will tag the 1.1.0 release from the branch. At > this point only critical bug fixes should be applied to the branch. > The trunk is now open for 1.2 development. I committed Travis' matrix indexing patch (plus a basic test) to 1.1.x. Hope that's okay. (All tests pass.) Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] lexsort
2008/5/6 Eleanor <[EMAIL PROTECTED]>: > >>> a = numpy.array([[1,2,6], [2,2,8], [2,1,7],[1,1,5]]) > >>> a > array([[1, 2, 6], >[2, 2, 8], >[2, 1, 7], >[1, 1, 5]]) > >>> indices = numpy.lexsort(a.T) > >>> a.T.take(indices,axis=-1).T > array([[1, 1, 5], >[1, 2, 6], >[2, 1, 7], >[2, 2, 8]]) > > > The above does what I want, equivalent to sorting on column A then > column B in Excel, but al the transposes are ungainly. I've stared at it a > while > but can't come up with a more elegant solution. Any ideas? It appears that lexsort is broken in several ways, and its docstring is misleading. First of all, this code is not doing quite what you describe. The primary key here is the [5,6,7,8] column, followed by the middle and then by the first. This is almost exactly the opposite of what you describe (and of what I expected). To get this to sort the way you describe, the clearest way is to write a sequence: In [34]: indices = np.lexsort( (a[:,1],a[:,0]) ) In [35]: a[indices,:] Out[35]: array([[1, 1, 5], [1, 2, 6], [2, 1, 7], [2, 2, 8]]) In other words,sort over a[:,1], then sort again over a[:,0], making a[:,0] the primary key. I have used "fancy indexing" to pull the array into the right order, but it can also be done with take(): In [40]: np.take(a,indices,axis=0) Out[40]: array([[1, 1, 5], [1, 2, 6], [2, 1, 7], [2, 2, 8]]) As for why I say lexsort() is broken, well, it simply returns 0 for higher-rank arrays (rather than either sorting or raising an exception), it raises an exception when handed axis=None rather than flattening as the docstring claims, and whatever the axis argument is supposed to do, it doesn't seem to do it: In [44]: np.lexsort(a,axis=0) Out[44]: array([1, 0, 2]) In [45]: np.lexsort(a,axis=-1) Out[45]: array([1, 0, 2]) In [46]: np.lexsort(a,axis=1) --- Traceback (most recent call last) /home/peridot/ in () : axis(=1) out of bounds Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Status Report for NumPy 1.1.0
2008/5/6 Charles R Harris <[EMAIL PROTECTED]>: > > On Tue, May 6, 2008 at 6:40 AM, Alan G Isaac <[EMAIL PROTECTED]> wrote: > > > > On Tue, 6 May 2008, Jarrod Millman apparently wrote: > > > open tickets that I would like everyone to take a brief > > > look at: > > > http://projects.scipy.org/scipy/numpy/ticket/760 > > > > My understanding is that my patch, which would give > > a deprecation warning, was rejected in favor of the patch > > specified at the end of Travis's message here: > > > http://projects.scipy.org/pipermail/numpy-discussion/2008-April/033315.html> > > > > At least that's how I thought things were resolved. > > I expected Travis's patch would be part of the 1.1 release. > > > > I think someone needs to step up and make the change, but first it needs to > be blessed by Travis and some of the folks on the Matrix indexing thread. > The change might also entail removing some workarounds in the spots > indicated by Travis. It has my vote, as a peripheral participant in the matrix indexing thread; in fact it's what some of us thought we were agreeing to earlier. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Tests for empty arrays
2008/5/6 Andy Cheesman <[EMAIL PROTECTED]>: > I was wondering if anyone could shed some light on how to distinguish an > empty array of a given shape and an zeros array of the same dimensions. An "empty" array, that is, an array returned by the function empty(), just means an uninitialized array. The only difference between it and an array returned by zeros() is the values in the array: from zeros() they are, obviously, all zero, while from empty() they can be anything, including zero(). In practice they tend to be crazy numbers of order 1e300 or 1e-300, but one can't count on this. As a general rule, I recommend against using empty() on your first pass through the code. Only once your code is working and you have tests running should you consider switching to empty() for speed reasons. In fact, if you want to use empty() down the road, it may make sense to initialize your array to zeros()/0., so that if you ever use the values, the NaNs will propagate and become obvious. Unfortunately, some CPUs implement NaNs in software, so there can be a huge speed hit. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Deprecating PyDataMem_RENEW ?
2008/5/5 David Cournapeau <[EMAIL PROTECTED]>: > Basically, what I have in mind is, in a first step (for numpy 1.2): > - define functions to allocate on a given alignement > - make PyMemData_NEW 16 byte aligned by default (to be compatible > with SSE and co). > > The problem was, and still is, realloc. It is not possible to implement > realloc with malloc/free (in a portable way), and as such, it is not > possible to have an aligned realloc. How much does this matter? I mean, what if you simply left all the reallocs as-is? The arrays that resulted from reallocs would not typically be aligned, but we cannot in any case expect all arrays to be aligned. Or, actually, depending on how you manage the alignment, it may be possible to write a portable aligned realloc. As I understand it, a portable aligned malloc allocates extra memory, then adds something to the pointer to make the memory arena aligned. free() must then recover the original pointer and call free() on it - though this can presumably be avoided if garbage collection is smart enough. realloc() also needs to recover the pointer to call the underlying realloc(). One answer is to ensure that at least one byte of padding is always done at the beginning, so that the number of pad bytes used on an aligned pointer p is accessible as ((uint8*)p)[-1]. Other schemes are available; I have a peculiar affection for the pure-python solution of constructing a view. In any of them, it seems to me, if free() can recover the original pointer, so can realloc()... I don't think any numpy function can assume that its input arrays are aligned, since users may like to pass, say, mmap()ed hunks of a file, over which numpy has no control of the alignment. In this case, then, how much of a problem is it if a few stray realloc()s produce non-aligned arrays? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Deprecating PyDataMem_RENEW ?
2008/5/5 Robert Kern <[EMAIL PROTECTED]>: > On Mon, May 5, 2008 at 7:44 AM, David Cournapeau > <[EMAIL PROTECTED]> wrote: > > > In numpy, we can always replace realloc by malloc/free, because we know > > the size of the old block: would deprecating PyMemData_RENEW and > > replacing them by PyMemeData_NEW/PyMemData_FREE be possible, such as to > > make all numpy arrays follow a default alignement ? There are only a few > > of them in numpy (6 of them), 0 in scipy, and I guess extensions never > > really used them ? > > I am in favor of at least trying this out. We will have to have a set > of benchmarks to make sure we haven't hurt the current uses of > PyMemData_RENEW which Tim points out. realloc() may not be a performance win anyway. Allocating new memory is quite fast, copying data is quite fast, and in-place realloc() tends to cause memory fragmentation - take an arena full of 1024-byte blocks and suddenly make one of them 256 bytes long and you haven't gained anything at all in most malloc() implementations. So yes, benchmarks. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combining Sigmoid Curves
2008/5/2 Rich Shepard <[EMAIL PROTECTED]>: > On Fri, 2 May 2008, Anne Archibald wrote: > > > It's better not to work point-by-point, appending things, when working > > with numpy. Ideally you could find a formula which just produced the right > > curve, and then you'd apply it to the input vector and get the output > > vector all at once. > > Anne, > >That's been my goal. :-) > > > > How about using the cosine? > > > > def f(left, right, x): > >scaled_x = (x-(right+left)/2)/((right-left)/2) > >return (1+np.cos((np.pi/2) * scaled_x))/2 > > > > exactly zero at both endpoints, exactly one at the midpoint, > > inflection points midway between, where the value is 1/2. If you want > > to normalize it so that the area underneath is one, that's easy to do. > > > More generally, the trick of producing a scaled_x as above lets you > > move any function anywhere you like. > >This looks like a pragmatic solution. When I print scaled_x (using left = > 0.0 and right = 100.0), the values range from -1.0 to +0.998. So, I need to > figure out the scale_x that sets the end points at 0 and 100. No, no. You *want* scaled_x to range from -1 to 1. (The 0.998 is because you didn't include the endpoint, 100.) The function I gave, (1+np.cos((np.pi/2) * scaled_x))/2, takes [-1, 1] to a nice bump-shaped function. If you feed in numbers from 0 to 100 as x, they get transformed to scaled_x, and you feed them to the function, getting a result that goes from 0 at x=0 to 1 at x=50 to 0 at x=100. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combining Sigmoid Curves
2008/5/2 Rich Shepard <[EMAIL PROTECTED]>: >What will work (I call it a pi curve) is a matched pair of sigmoid curves, > the ascending curve on the left and the descending curve on the right. Using > the Boltzmann function for these I can calculate and plot each individually, > but I'm having difficulty in appending the x and y values across the entire > range. This is where I would appreciate your assistance. It's better not to work point-by-point, appending things, when working with numpy. Ideally you could find a formula which just produced the right curve, and then you'd apply it to the input vector and get the output vector all at once. For example for a Gaussian (which you're not using, I know), you'd write a function something like def f(x): return np.exp(-x**2) and then call it like: f(np.linspace(-1,1,1000)). This is efficient and clear. It does not seem to me that the logistic function, 1/(1+np.exp(x)) does quite what you want. How about using the cosine? def f(left, right, x): scaled_x = (x-(right+left)/2)/((right-left)/2) return (1+np.cos((np.pi/2) * scaled_x))/2 exactly zero at both endpoints, exactly one at the midpoint, inflection points midway between, where the value is 1/2. If you want to normalize it so that the area underneath is one, that's easy to do. More generally, the trick of producing a scaled_x as above lets you move any function anywhere you like. If you want the peak not to be at the midpoint of the interval, you will need to do something a litle more clever, perhaps choosing a different function, scaling the x values with a quadratic polynomial so that (left, middle, right) becomes (-1,0,1), or using a piecewise function. Good luck, Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Broadcasting question
2008/5/2 Chris.Barker <[EMAIL PROTECTED]>: > Hi all, > > I have a n X m X 3 array, and I a n X M array. I want to assign the > values in the n X m to all three of the slices in the bigger array: > > A1 = np.zeros((5,4,3)) > A2 = np.ones((5,4)) > A1[:,:,0] = A2 > A1[:,:,1] = A2 > A1[:,:,2] = A2 > > However,it seems I should be able to broadcast that, so I don't have to > repeat myself, but I can't figure out how. All you need is a newaxis on A2: In [2]: A = np.zeros((3,4)) In [3]: B = np.ones(3) In [4]: A[:,:] = B[:,np.newaxis] In [5]: A Out[5]: array([[ 1., 1., 1., 1.], [ 1., 1., 1., 1.], [ 1., 1., 1., 1.]]) Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] recarray fun
2008/5/1 Travis E. Oliphant <[EMAIL PROTECTED]>: > Stéfan van der Walt wrote: > > 2008/4/30 Christopher Barker <[EMAIL PROTECTED]>: > > > >> Stéfan van der Walt wrote: > >> > That's the way, or just rgba_image.view(numpy.int32). > >> > >> ah -- interestingly, I tried: > >> > >> rgba_image.view(dtype=numpy.int32) > >> > >> and got: > >> > >> Traceback (most recent call last): > >>File "", line 1, in > >> TypeError: view() takes no keyword arguments > >> > >> Since it is optional, shouldn't it be keyword argument? > >> > > > > Thanks, fixed in r5115. > > > > > This was too hasty. I had considered this before. > > The problem with this is that the object can be either a type object or > a data-type object. You can use view to both re-cast a numpy array as > another subtype or as another data-type. > > So, please revert the change until a better solution is posted. If we're going to support keyword arguments, shouldn't those two options be *different* keywords (dtype and ndarray_subclass, say)? Then a single non-keyword argument tells numpy to guess which one you wanted... Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] untenable matrix behavior in SVN
2008/4/30 Charles R Harris <[EMAIL PROTECTED]>: > Some operations on stacks of small matrices are easy to get, for instance, > +,-,*,/, and matrix multiply. The last is the interesting one. If A and B > are stacks of matrices with the same number of dimensions with the matrices > stored in the last two indices, then > > sum(A[...,:,:,newaxis]*B[...,newaxis,:,:], axis=-2) > > is the matrix-wise multiplication of the two stacks. If B is replaced by a > stack of 1D vectors, x, it is even simpler: > > sum(A[...,:,:]*x[...,newaxis,:], axis=-1) > > This doesn't go through BLAS, but for large stacks of small matrices it > might be even faster than BLAS because BLAS is kinda slow for small > matrices. Yes and no. For the first operation, you have to create a temporary that is larger than either of the two input arrays. These invisible (potentially) gigantic temporaries are the sort of thing that puzzle users when as their problem size grows they suddenly find they hit a massive slowdown because it starts swapping to disk, and then a failure because the temporary can't be allocated. This is one reason we have dot() and tensordot() even though they can be expressed like this. (The other is of course that it lets us use optimized BLAS.) This rather misses the point of Timothy Hochberg's suggestion (as I understood it), though: yes, you can write the basic operations in numpy, in a more or less efficient fashion. But it would be very valuable for arrays to have some kind of metadata that let them keep track of which dimensions represented simple array storage and which represented components of a linear algebra object. Such metadata could make it possible to use, say, dot() as if it were a binary ufunc taking two matrices. That is, you could feed it two arrays of matrices, which it would broadcast to the same shape if necessary, and then it would compute the elementwise matrix product. The question I have is, what is the right mathematical model for describing these arrays-some-of-whose-dimensions-represent-linear-algebra-objects? One idea is for each dimension to be flagged as one of "replication", "vector", or "covector". A column vector might then be a rank-1 vector array, a row vector might be a rank-1 covector array, a linear operator might be a rank-2 object with one covector and one vector dimension, a bilinear form might be a rank-2 object with two covector dimensions. Dimensions designed for holding repetitions would be flagged as such, so that (for example) an image might be an array of shape (N,M,3) of types ("replication","replication","vector"); then to apply a color-conversion matrix one would simply use dot() (or "*" I suppose). without too much concern for which index was which. The problem is, while this formalism sounds good to me, with a background in differential geometry, if you only ever work in spaces with a canonical metric, the distinction between vector and covector may seem peculiar and be unhelpful. Implementing such a thing need not be too difficult: start with a new subclass of ndarray which keeps a tuple of dimension types. Come up with an adequate set of operations on them, and implement them in terms of numpy's functions, taking advantage of the extra information about each axis. A few operations that spring to mind: * Addition: it doesn't make sense to add vectors and covectors; raise an exception. Otherwise addition is always elementwise anyway. (How hard should addition work to match up corresponding dimensions?) * Multiplication: elementwise across "repetition" axes, it combines vector axes with corresponding covector axes to get some kind of generalized matrix product. (How is "corresponding" defined?) * Division: mostly doesn't make sense unless you have an array of scalars (I suppose it could compute matrix inverses?) * Exponentiation: very limited (though I suppose matrix powers could be implemented if the shapes are right) * Change of basis: this one is tricky because not all dimensions need come from the same vector space * Broadcasting: the rules may become a bit byzantine... * Dimension metadata fiddling Is this a useful abstraction? It seems like one might run into trouble when dealing with arrays whose dimensions represent vectors from unrelated spaces. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] very simple iteration question.
2008/4/30 Christopher Barker <[EMAIL PROTECTED]>: > a g wrote: > > OK: how do i iterate over an axis other than 0? > > This ties in nicely with some of the discussion about interating over > matrices. It ahs been suggested that it would be nice to have iterators > for matrices, so you could do: > > for row in M.rows: >... > > and > > for column in M.cols: >... > > > If so, then wouldn't it make sense to have built in iterators for > nd-arrays as well? something like: > > for subarray in A.iter(axis=i): > ... > > where axis would default to 0. > > This is certainly cleaner than: > > for j in range(A.shape[i]): > subarray = A[:,:,j,:] > > Wait! I have no idea how to spell that generically -- i.e. the ith > index, where i is a variable. There must be a way to build a slice > object dynamically, but I don't know it. Slices can be built without too much trouble using slice(), but it's much easier to just write for subarray in np.rollaxis(A,i): ... rollaxis() just pulls the specified axis to the front. (It doesn't do what I thought it would, which is a cyclic permutation of the axes). Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] very simple iteration question.
2008/4/30 a g <[EMAIL PROTECTED]>: > Hi. This is a very basic question, sorry if it's irritating. If i > didn't find the answer written already somewhere on the site, please > point me to it. That'd be great. > > OK: how do i iterate over an axis other than 0? > > I have a 3D array of data[year, week, location]. I want to iterate > over each year at each location and run a series of stats on the > columns (on the 52 weeks in a particular year at a particular location). > 'for years in data:' will get the first one, but then how do i not > iterate over the 1 axis and iterate over the 2 axis instead? Well, there's always for i in xrange(A.shape[1]): do_something(A[:,i,...]) But that's kind of ugly in this day and age. I would be tempted by for a in np.rollaxis(A,1): do_something(a) It's worth mentioning that traversing an array along an axis not the first usually results in subarrays that are not contiguous in memory. While numpy makes working with these convenient, they may not be very efficient for cache reasons (for example, modern processors load from memory - an extremely slow operation - in blocks that may be as large as 64 bytes; if you only use eight of these before moving on, your code will use much more memory bandwidth than it needs to). If this is a concern, you can usually transparently rearrange your array's memory layout to avoid it. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] should outer take an output argument?
2008/4/29 Alan G Isaac <[EMAIL PROTECTED]>: > As I was looking at Bill's conjugate gradient posting, > I found myself wondering if there would be a payoff > to an output argument for ``numpy.outer``. (It is fairly > natural to repeatedly recreate the outer product of > the adjusted residuals, which is only needed during a > single iteration.) I avoid np.outer, as it seems designed for foot shooting. (It forcibly flattens both arguments into vectors no matter what shape they are and then computes their outer product.) np.multiply.outer does (what I consider to be) the right thing. Not that it takes an output argument either, as far as I can tell. But trying to inspect it suggests a few questions: * Why are ufunc docstrings totally useless? * Why are output arguments not keyword arguments? As for your original question, yes, I think it would be good if .outer() and .reduce() methods of ufuncs took output arguments. This would let one use add.reduce() as a poor man's sum() even when dealing with uint8s, for example. (Coming up with a problem for which subtract.reduce(), divide.reduce(), or arctan2.reduce() are the solutions is left as an exercise for the reader.) I can definitely see a use for divide.outer() : Int -> Int -> Float, though. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] untenable matrix behavior in SVN
On 30/04/2008, Keith Goodman <[EMAIL PROTECTED]> wrote: > On Tue, Apr 29, 2008 at 2:18 PM, Anne Archibald > <[EMAIL PROTECTED]> wrote: > > It is actually pretty unreasonable to hope that > > > > A[0] > > > > and > > > > A[[1,2,3]] > > or > > A[[True,False,True]] > > > > should return objects of the same rank. > > Why it unreasonable to hope that > > x[0,:] > > and > > x[0, [1,2,3]] > > or > > x[0, [True,False,True]] > > where x is a matrix, continue to return matrices? Well, I will point out that your example is somewhat different from mine; nobody is arguing that your three examples should return objects of different ranks. There is some disagreement over whether x[0,:] should be a rank-1 or a rank-2 object, but x[0,[1,2,3]] and x[0, [True, False, True]] should all have the same rank as x[0,:]. Nobody's questioning that. What I was pointing out is that x[0,0] should not have the same rank as x[0,:] or x[0,[0]]. In this case it's obvious; x[0,0] should be a scalar. But the same logic applies to x[0,:] versus x[[0,1],:] or even x[[0],:]. For arrays, if x has rank 2, x[0,:] has rank 1. if L is a list of indices, x[L,:] always has rank 2, no matter what is actually in L (even if it's one element). It is perfectly reasonable that they should yield objects of different ranks. With matrices, we have the surprising result that x[0,:] is still a two-dimensional object; to get at its third element I have to specify x[0,:][0,2]. It seems to me that this is a peculiar hack designed to ensure that the result still has "*" defined in a matrix way. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] dot/tensordot limitations
Timothy Hochberg has proposed a generalization of the matrix mechanism to support manipulating arrays of linear algebra objects. For example, one might have an array of matrices one wants to apply to an array of vectors, to yield an array of vectors: In [88]: A = np.repeat(np.eye(3)[np.newaxis,...],2,axis=0) In [89]: A Out[89]: array([[[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.]], [[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.]]]) In [90]: V = np.array([[1,0,0],[0,1,0]]) Currently, it is very clumsy to handle this kind of situation even with arrays, keeping track of dimensions by hand. For example if one wants to multiply A by V "elementwise", one cannot simply use dot: In [92]: np.dot(A,V.T) Out[92]: array([[[ 1., 0.], [ 0., 1.], [ 0., 0.]], [[ 1., 0.], [ 0., 1.], [ 0., 0.]]]) tensordot() suffers from the same problem. It arises because when you combine two multiindexed objects there are a number of ways you can do it: A: Treat all indices as different and form all pairwise products (outer product): In [93]: np.multiply.outer(A,V).shape Out[93]: (2, 3, 3, 2, 3) B: Contract the outer product on a pair of indices: In [98]: np.tensordot(A,V,axes=(-1,-1)).shape Out[98]: (2, 3, 2) C: Take the diagonal along a pair of indices: In [111]: np.diagonal(np.multiply.outer(A,V),axis1=0,axis2=3).shape Out[111]: (3, 3, 3, 2) What we want in this case is a combination of B and C: In [106]: np.diagonal(np.tensordot(A,V,axes=(-1,-1)),axis1=0,axis2=2).T.shape Out[106]: (2, 3) but it cannot be done without constructing a larger array and pulling out its diagonal. If this seems like an exotic thing to want to do, suppose instead we have two arrays of vectors, and we want to evaluate the array of dot products. None of no.dot, np.tensordot, or np.inner produce the results you want. You have to multiply elementwise and sum. (This also precludes automatic use of BLAS, if indeed tensordot uses BLAS.) Does it make sense to implement a generalized tensordot that can do this, or is it the sort of thing one should code by hand every time? Is there any way we can make it easier to do simple common operations like take the elementwise matrix-vector product of A and V? The more general issue, of making linear algebra natural by keeping track of which indices are for elementwise operations, and which should be used for dots (and how), is a whole other kettle of fish. I think for that someone should think hard about writing a full-featured multilinear algebra package (might as well keep track of coordinate transformations too while one was at it) if this is intended. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] accuracy issues with numpy arrays?
On 30/04/2008, eli bressert <[EMAIL PROTECTED]> wrote: > I'm writing a quick script to import a fits (astronomy) image that has > very low values for each pixel. Mostly on the order of 10^-9. I have > written a python script that attempts to take low values and put them > in integer format. I basically do this by taking the mean of the 1000 > lowest pixel values, excluding zeros, and dividing the rest of the > image by that mean. Unfortunately, when I try this in practice, *all* > of the values in the image are being treated as zeros. But, if I use a > scipy.ndimage function, I get proper values. For example, I take the > pixel that I know has the highest value and do I think the bug is something else: > import pyfits as p > import scipy as s > import scipy.ndimage as nd > import numpy as n > > def flux2int(name): > d = p.getdata(name) > x,y = n.shape(d) > l = x*y > arr1= n.array(d.reshape(x*y,1)) > temp = n.unique(arr1[0]) # This is where the bug starts. All values > are treated as zeros. Hence only one value remains, zero. Actually, since arr1 has shape (x*y, 1), arr1[0] has shape (1,), and so it has only one entry: In [82]: A = np.eye(3) In [83]: A.reshape(9,1) Out[83]: array([[ 1.], [ 0.], [ 0.], [ 0.], [ 1.], [ 0.], [ 0.], [ 0.], [ 1.]]) In [84]: A.reshape(9,1)[0] Out[84]: array([ 1.]) The python debugger is a good way to check this sort of thing out; if you're using ipython, typing %pdb will start the debugger when an exception is raised, at which point you can poke around in all your local variables and evaluate expressions. > arr1= arr1.sort() > arr1= n.array(arr1) > arr1= n.array(arr1[s.where(arr1 >= temp)]) > > val = n.mean(arr1[0:1000]) > > d = d*(1.0/val) > d = d.round() > p.writeto(name[0,]+'fixed.fits',d,h) Good luck, Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] untenable matrix behavior in SVN
On 29/04/2008, Keith Goodman <[EMAIL PROTECTED]> wrote: > On Tue, Apr 29, 2008 at 11:21 AM, Alan G Isaac <[EMAIL PROTECTED]> wrote: > > On Tue, 29 Apr 2008, Keith Goodman apparently wrote: > > > I often use x[i,:] and x[:,i] where x is a matrix and i is > > > a scalar. I hope this continues to return a matrix. > > > > 1. Could you give an example of the circumstances of this > > use? > > > In my use i is most commonly an array (i = M.where(y.A)[0] where y is > a nx1 matrix), sometimes a list, and in ipython when debugging or > first writing the code, a scalar. It would seem odd to me if x[i,:] > returned different types of objects based on the type of i: > > array index > idx = M.where(y.A)[0] where y is a nx1 matrix > x[dx,:] --> matrix > > list index > idx = [0] > x[idx,:] --> matrix? > > scalar index > idx = 0 > x[idx,:] --> not matrix It is actually pretty unreasonable to hope that A[0] and A[[1,2,3]] or A[[True,False,True]] should return objects of the same rank. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] untenable matrix behavior in SVN
On 29/04/2008, Gael Varoquaux <[EMAIL PROTECTED]> wrote: > On Tue, Apr 29, 2008 at 11:03:58PM +0200, Anne Archibald wrote: > > I am puzzled by this. What is the rationale for x[i,:] not being a 1-d > > object? > > It breaks A*B[i, :] where A and B are matrices. Really? How? In [26]: A = np.matrix([[1,0],[0,1],[1,1]]) In [28]: A*np.ones(2) Out[28]: matrix([[ 1., 1., 2.]]) In [29]: np.ones(3)*A Out[29]: matrix([[ 2., 2.]]) I guess you don't get dot products with *: In [30]: np.ones(3).T*np.ones(3) Out[30]: array([ 1., 1., 1.]) Since the whole point of matrices it to avoid having to type "dot" I guess this should be convincing. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] untenable matrix behavior in SVN
On 29/04/2008, Travis E. Oliphant <[EMAIL PROTECTED]> wrote: > I'm quite persuaded now that a[i] should return a 1-d object for > arrays.In addition to the places Chuck identified, there are at > least 2 other places where special code was written to work-around the > expectation that item selection returns an object with reduced > dimensionality (a special-cased .tolist for matrices and a special-cased > getitem in the fancy indexing code). > > As the number of special-case work-arounds grows the more I'm convinced > the conceptualization is wrong. So, I now believe we should change the > a[i] for matrices to return a 1-d array. > > The only down-side I see is that a[i] != a[i,:] for matrices. > However, matrix(a[i]) == a[i,:], and so I'm not sure there is really a > problem, there. I also don't think that whatever problem may actually > be buried in the fact that type(a[i]) != type(a[i,:]) is worse than the > problem that several pieces of NumPy code actually expect hierarchical > container behavior of multi-dimensional sequences. I am puzzled by this. What is the rationale for x[i,:] not being a 1-d object? Does this not require many special-case bits of code as well? What about a[i,...]? That is what I would use to make a hierarchical bit of code, and I would be startled to find myself in an infinite loop waiting for the dimension to become one. > I don't think making the small change to have a[i] return 1-d arrays > precludes us from that 1-d array being a bit more formalized in 1.2 as a > RowVector should we choose that direction. It does, however, fix > several real bugs right now. +1 What's more, I think we need to have a matrix-cleanliness test suite that verifies that all basic numpy tools behave identically on matrices and preserve matrixiness. But that's a big job, and for 1.2. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Movement of ma breaks matplotlib
Hi, In the upcoming release of numpy. numpy.core.ma ceases to exist. One must use numpy.ma (for the new interface) or numpy.oldnumeric.ma (for the old interface). This has the unfortunate effect of breaking matplotlib(which does "from numpy.core.ma import *") - I cannot even "import pylab" with a stock matplotlib (0.90.1) and an SVN numpy. Is this an intended effect? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] untenable matrix behavior in SVN
On 25/04/2008, Charles R Harris <[EMAIL PROTECTED]> wrote: > > > On Fri, Apr 25, 2008 at 12:02 PM, Alan G Isaac <[EMAIL PROTECTED]> wrote: > > I think we have discovered that there is a basic conflict > > between two behaviors: > > > >x[0] == x[0,:] > >vs. > > > >x[0][0] == x[0,0] > > > > To my recollection, everyone has agree that the second > > behavior is desirable as a *basic expectation* about the > > behavior of 2d objects. This implies that *eventually* we > > will have ``x[0][0] == x[0,0]``. But then eventually > > we MUST eventually have x[0] != x[0,:]. > > > Choices: > > 1) x[0] equiv x[0:1,:] -- current behavior > 2) x[0] equiv x[0,:] -- array behavior, gives x[0][0] == x[0,0] > > These are incompatible. I think 1) is more convenient in the matrix domain > and should be kept, while 2) should be given up. What is the reason for > wanting 2)? Maybe we could overload the () operator so that x(0) gives 2)? > Or vice versa. We are currently operating in a fog of abstraction, trying to decide which operations are more "natural" or more in correspondence with our imagined idea of a user. We're not getting anywhere. Let's start writing up (perhaps on the matrix indexing wiki page) plausible use cases for scalar indexing, and how they would look under each proposal. For example: * Iterating over the rows of a matrix: # array-return proposal; x is a scalar for row in M: if np.all(row==0): raise ValueError # exception-raising proposal for i in xrange(M.shape[0]): if np.all(M[i,:]==0): raise ValueError Other applications? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Generating Bell Curves (was: Using normal() )
On 24/04/2008, Rich Shepard <[EMAIL PROTECTED]> wrote: >Thanks to several of you I produced test code using the normal density > function, and it does not do what we need. Neither does the Gaussian > function using fwhm that I've tried. The latter comes closer, but the ends > do not reach y=0 when the inflection point is y=0.5. > >So, let me ask the collective expertise here how to generate the curves > that we need. > >We need to generate bell-shaped curves given a midpoint, width (where y=0) > and inflection point (by default, y=0.5) where y is [0.0, 1.0], and x is > usually [0, 100], but can vary. Using the NumPy arange() function to produce > the x values (e.g, arange(0, 100, 0.1)), I need a function that will produce > the associated y values for a bell-shaped curve. These curves represent the > membership functions for fuzzy term sets, and generally adjacent curves > overlap where y=0.5. It would be a bonus to be able to adjust the skew and > kurtosis of the curves, but the controlling data would be the > center/midpoint and width, with defaults for inflection point, and other > parameters. > >I've been searching for quite some time without finding a solution that > works as we need it to work. First I should say, please don't call these "bell curves"! It is confusing people, since that usually means specifically a Gaussian. In fact it seems that you want something more usually called a "sigmoid", or just a curve with a particular shape. I would look at http://en.wikipedia.org/wiki/Sigmoid_function In particular, they point out that the integral of any smooth, positive, "bump-shaped" function will be a sigmoid. So dreaming up an appropriate bump-shaped function is one way to go. Alternatively, tou can look at polynomial fitting - if you want, say, a function that is 1 with derivative zero at 0, 0.5 with derivative -1 at x, and 0 with derivative 0 at 1, you can construct a unique degree-6 polynomial that does exactly that; there's a new tool, KroghInterpolator, in scipy svn that can do that for you. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy release
On 25/04/2008, Stéfan van der Walt <[EMAIL PROTECTED]> wrote: > 2008/4/25 Alan G Isaac <[EMAIL PROTECTED]>: > > > I must have misunderstood: > > I thought the agreement was to > > provisionally return a 1d array for x[0], > > while we hashed through the other proposals. > > The agreement was: > > a) That x[0][0] should be equal to x[0,0] and > b) That x[0,:] should be equal to x[0] (as for ndarrays) > > This breaks as little functionality as possible, at the cost of one > (instead of two) API changes. Hold on. There has definitely been some confusion here. This is not what I thought I was suggesting, or what Alan thought he was suggesting. I do not think special-casing matrices for which one dimension happens to be one is a good idea at all, even temporarily. This is the kind of thing that drives users crazy. My suggested stopgap fix was to make x[0] return a 1D *array*; I feel that this will result in less special-casing. In fact I wasn't aware that anyone had proposed the fix you implemented. Can we change the stopgap solution? > We should now discuss the proposals on the table, choose a good one, > and implement all the API changes necessary for 1.2 or 2. It's a pity > we have to change the API again, but the current situation is not > tenable. Yes, well, it really looks unlikely we will be able to agree on what the correct solution is before 1.1, so I would like to have something non-broken for that release. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] "aligned" matrix / ctypes
On 24/04/2008, Zachary Pincus <[EMAIL PROTECTED]> wrote: > In the mean time, is there any way to (from within python) construct > an array from another array by specifying the dtype and strides? That > is, some other way to do the view/slice business directly? There's the ndarray constructor, which lets you do some highly dodgy things like create arrays whose rows overlap in memory. I would think it would be possible to do this with that. You might also want to think about allowing arrays to have their rows aligned (for example, a 100 by 100 array of float32s which wants 64-byte alignment for every row). Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using normal()
On 24/04/2008, Keith Goodman <[EMAIL PROTECTED]> wrote: > On Thu, Apr 24, 2008 at 10:01 AM, Rich Shepard <[EMAIL PROTECTED]> wrote: > >In the application's function I now have: > > > >from numpy import * > > > >x = nx.arange(0, 100, 0.1) > >y = nx.normal(center,width) # passed to the function when called > > > > and I then pass x,y to PyX for plotting. But python responds with: > > > >y = nx.normal(center,width) > > AttributeError: 'module' object has no attribute 'normal' > > It's random.normal(loc, scale). But that will give you random x values > drawn from the normal distribution, not y values at your x. So you'll > have to code your own normal, which will probably look something like > > norm = 1 / (scale * sqrt(2 * pi)) > y = norm * exp(-power((x - loc), 2) / (2 * scale**2)) I really have to recommend you instead install scipy, which contains this as well as many other probability distributions: D = scipy.stats.norm(0, 1) Then D.pdf(x) gives you the probability density function at x, D.cdf(x) gives you the probability that the function value is less than x, D.icdf(y) gives you the x value for which the probability is y that the value will be less than x, and so on. This is also available for more probability distributions than I had ever heard of. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] "aligned" matrix / ctypes
On 23/04/2008, Zachary Pincus <[EMAIL PROTECTED]> wrote: > Hi, > > Thanks a ton for the advice, Robert! Taking an array slice (instead of > trying to set up the strides, etc. myself) is a slick way of getting > this result indeed. It's worth mentioning that there was some discussion of adding an allocator for aligned arrays. It got sidetracked into a discussion of SSE support for numpy, but there are all sorts of reasons to want aligned arrays in numpy, as this post demonstrates. Seeing as it's just a few lines worth of pure python, is anyone interested in writing an aligned_empty() for numpy? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy release
On 23/04/2008, Alan Isaac <[EMAIL PROTECTED]> wrote: > On Wed, 23 Apr 2008, Sebastian Haase wrote: > > What used to be referred to a the 1.1 version, that can > > break more stuff, to allow for a cleaner design, will now > > be 1.2 > > So ... fixing x[0][0] for matrices should wait > until 1.2. Is that correct? It seems to me that everyone agrees that the current situation is broken, but there is considerable disagreement on what the correct fix is. My inclination is to apply the minimal fix you suggest, with tests and documentation, as a stopgap, and let the different factions sort it out by discussion on the way to 1.2. Objections? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Truth value of an array
On 18/04/2008, Robert Kern <[EMAIL PROTECTED]> wrote: > On Fri, Apr 18, 2008 at 3:33 PM, Joe Harrington <[EMAIL PROTECTED]> wrote: > > For that matter, is there a reason logical operations don't work on > > arrays other than booleans? What about: > > The keywords "and", "or", and "not" only work on bool objects; Python > tries to convert the operands using bool(). bool(some_array) raises > the exception just as we've been discussing. This is a bit misleading, as the actual value is returned: In [167]: "" or "A" Out[167]: 'A' In [168]: "A" and "B" Out[168]: 'B' In fact the problem is that "and" and "or" are short-circuiting, which means that "return X or Y" is equivalent to something like: if X: return X elif Y: return Y else: return False These are handled specially by syntax so that, for example, you can do "False and 1/0" and not raise a ZeroDivisionError. So "and" and "or" aren't even really operators, and it's not just impossible to change them but unlikely to ever become possible. Just cast your arrays to booleans if you want to do boolean operations on them. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Truth value of an array
On 18/04/2008, Olivier <[EMAIL PROTECTED]> wrote: > Let's restrict the discussion the case to boolean arrays (dtype bool), > since all the comparisons (A==B, A!=B, A arrays). > > So I have an array filled with booleans. Is there a reason not to map > "bool(A)" to "A.all()" but instead raise an exception? > > As far as I can see, "if A==B" is clear enough; one always means > "(A==B).all()". Isn't that so? > > I would like to see examples where there is clearly an ambiguity so > that I understand the benefit of having an exception raised for > boolean matrices instead of simply using all(). > > If there is no such clear example, then why not map "bool(A)" to > "A.all()" in numpy? In [1]: import numpy as np In [2]: A = np.array([0,1]) In [3]: B = np.array([0,0]) In [4]: (A==B).all() Out[4]: False In [5]: (A!=B).all() Out[5]: False One would expect A!=B to be the logical opposite of A==B, but with your proposed suggestion it is not. In math, when comparing functions, one can compare the functions as a whole, or one can compare them pointwise. numpy's implementation does pointwise comparison, for a variety of good reasons. As for why converting an array to a boolean doesn't automatically do all(), if you don't know you are dealing with an array and using all(), you will surely shoot yourself in the foot sooner rather than later (as the above example shows). If you do, simply wrapping it in all() is easy and clear. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] broken numpy.core.tests.test_multiarray.TestScalarIndexing.SetUp
On 17/04/2008, Stéfan van der Walt <[EMAIL PROTECTED]> wrote: > On 17/04/2008, Robert Kern <[EMAIL PROTECTED]> wrote: > > > On Thu, Apr 17, 2008 at 1:21 PM, Eric Firing <[EMAIL PROTECTED]> wrote: > > > Arg! Cancel that! I didn't look carefully enough. How embarrassing! > > > Sorry for the noise. > > > > > > Don't apologize. That is very odd code. Stefan, is there a reason to > > form a 1-item tuple then do 1-item tuple unpacking everywhere? > > > Did I write that code? I agree that formatting is odd. Oh! Actually I think that one's my fault. Comes of being too minimal in modifying one of your test cases. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] broken numpy.core.tests.test_multiarray.TestScalarIndexing.SetUp
On 17/04/2008, Robert Kern <[EMAIL PROTECTED]> wrote: > On Thu, Apr 17, 2008 at 1:21 PM, Eric Firing <[EMAIL PROTECTED]> wrote: > > Arg! Cancel that! I didn't look carefully enough. How embarrassing! > > Sorry for the noise. > > > Don't apologize. That is very odd code. Stefan, is there a reason to > form a 1-item tuple then do 1-item tuple unpacking everywhere? The > test works the same after removing the extraneous commas. Anyways, > I've checked that in. This is not necessarily a justification, but many tests construct tuples of test objects which are then unpacked at the beginning of every function. This is not unreasonable when multiple objects are present: class TestSomething(NumpyTestCase): def setUp(self): A = array([1,2,3]) B = array([4,5,6]) self.d = A, B def test_something(self): A, B = self.d assert_not_equal(A,B) It's a little less cumbersome than using self.A and self.B inside each test case. Does it make sense to use a length-1 tuple when there's only one piece of test data, just for consistency? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Matrix vs ndarray
On 17/04/2008, Santanu Chatterjee <[EMAIL PROTECTED]> wrote: > Hi Numpy users, > I used MATLAB to do numerical calculations for a long time. Recently I > am digging into python and numpy. I am wondering about the following > question : > > 1) What is the difference between ndarray and matrix in numpy? My idea is > that having N-dimensional array is sufficient (of course a MATLAB users > point of view). If anyone can provide some idea, I will appreciate it. The difference comes about because python has a paucity of operators. In particular, there is only one multiplication operator. Thus for ndarrays "*" does elementwise multiplication, and to get matrix multiplication you must use the function "dot". Similarly "**" does elementwise exponentiation, and to get matrix exponentiation you must use the function "matrix_power". Since some people find this inconvenient, the "matrix" class exists. It represents arrays that are always two-dimensional, and for them "*" does matrix multiplication. The only difference is how the operators (and certain indexing operations) are interpreted. If you want to interpret a matrix M as an array, just use "M.A"; it you want to interpret an array X as a matrix, do "matrix(X)". In neither case is data copied. Personally, I do not ever use matrices; I find "dot" is quite convenient enough for me. Moreover the numpy standard library has an undetermined number of bugs where standard functions acting on matrices return the correct values but in the form of arrays instead; thus matrix users occasionally find that they have inadvertently (and silently) transformed their matrices back into arrays. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Release of NumPy
On 16/04/2008, Stéfan van der Walt <[EMAIL PROTECTED]> wrote: > On 16/04/2008, Alan G Isaac <[EMAIL PROTECTED]> wrote: > > > The whole issue occurs because a Matrix is not a proper > > > container. > > > > > > Right. And *that* is the case because of the attempt to > > treat matrices as containers of matrices instead of as > > containers of 1d arrays. > > > > I can see no real advantage to having matrices be containers > > of row vectors instead. A row vector would just be a matrix > > (i.e., essentially 2d) that allowed 1d style indexing. > > Again I ask, have you ever needed such a thing? > > > In the Matrix world, there is no such thing as an (N,) array -- which > is why the current Matrix object behaves so strangely, and always > returns a result with ndim > 1. > > Your proposal suggests that a Matrix be a container of arrays, but it > does not address the slicing of column vectors, i.e. > > x[0] > x[0,:] > x[:,0] > > would then behave in completely different ways, whereas with ndarrays > they don't. > > If you modified the proposal to say "any slicing operation that > returns a (1,N) or (N,1) matrix should now return an (N,) ndarray", it > would be more consistent, but then, wouldn't that break some other > expected behaviour? I thought one of the powerful features of a > matrix is that it preserves the orientation of a vector slice? > > The only way I've seen so far to have consistent slicing, is to have > an N-dim Matrix object be a container of N-1 dim Matrices, up to the > 2-D case, in which it becomes a container of Vectors which, in turn, > contain scalars. I don't think of arrays as containers of anything but scalars, so I find this whole argument from intuition extremely strange. My (draconian) suggestion would be to simply raise an exception when a matrix is indexed with a scalar. They're inherently two-dimensional; if you want a submatrix you should provide both indices (possibly including a ":"). If you actually want a subarray, as with an array, use ".A". That said, I don't actually use matrices, so I don't get a vote. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Release of NumPy
On 15/04/2008, Jon Wright <[EMAIL PROTECTED]> wrote: > > On 15/04/2008, Alan G Isaac <[EMAIL PROTECTED]> wrote: > > ... > > > The proposal on the table is to remove an unneeded (and > > unwanted) deviation of the matrix API from the ndarray API. > > ... > > How about writing up the changes needed PEP style on the wiki? +1 This discussion risks going around in circles. Write up your proposed solutions, with example code, in PEP style, here: http://www.scipy.org/ProposedEnhancements Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] float32 is not a float ?
On 10/04/2008, Charles R Harris <[EMAIL PROTECTED]> wrote: > I think you want the isreal function, but it will also return true for > complex with 0 imaginary part. Hmm... the various iswhatever functions seem > to be lacking in coverage. Maybe we should fix that. icomplexobj is designed to solve that problem. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Infinity definitions
On 10/04/2008, Travis E. Oliphant <[EMAIL PROTECTED]> wrote: > Bruce Southey wrote: > > Hi, > > Since we are discussing namespace and standardization, I am curious in > > why there are multiple definitions for defining infinity in numpy when > > perhaps there should be two (one for positive infinity and one for > > negative infinity). I really do understand that other people have use of > > these definitions and that it is easier to leave them in than take them > > out. Also, it is minor reduction in namespace because I do know that > > much of the namespace is either defining variables (like different > > floats and complex numbers) or mathematical functions (like logs and > > trig functions). > > > > Currently we have: > > numpy.Inf > > numpy.Infinity > > numpy.inf > > numpy.infty > > numpy.NINF > > numpy.PINF > > > > Most of these are defined in numeric.py: 'Inf = inf = infty = Infinity = > > PINF' > > In the f2py/tests subdirectories, the files return_real.py and > > return_complex.py uses both 'inf','Infinity'. > > The only occurrence of NINF and PINF are in core/src/umathmodule.c but I > > don't see any other usage. > > There does not seem to be any use of 'infty'. > > > > I think this is a product of bringing together a few definitions into > one and not forcing a standard. > > numpy.inf > numpy.nan > > should be used except for backward compatibility. The others have some use if you want to be able to use the results of repr() as literals - as I understand it the output representation of a NaN depends on the C library, and users seeing, say, "NaN" might well expect to be able to type NaN (after from numpy import NaN) and get a NaN. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] boolean indexing of rank-0 arrays and scalars - ticket #721
Hi, Ticket #721 is titled "0-dimensional boolean arrays should work as masks for array scalars". It is not quite clear to me what the right behaviour is. We are generally trying to make zero-dimensional arrays behave the same as array scalars, but I'm not sure how that should work in this case: In [2]: scalar = np.array([1,2])[0] In [3]: rank0 = np.array(1) In [4]: scalar[np.array(False)] --- Traceback (most recent call last) /home/peridot/software/numpy/test/lib/python2.5/site-packages/ in () : invalid index to scalar variable. In [5]: rank0[np.array(False)] Out[5]: array([], dtype=int32) In [6]: rank0[np.array(False)].shape Out[6]: (0,) In [7]: rank0[np.array(True)].shape Out[7]: () Here indexing a zero-dimensional array produces either a zero-dimensional array if the argument is True or a one-dimensional array of length zero if the argument is False. This is necessary because there's not really a way to represent "no value" with a zero-dimensional array - they always have exactly one value. Should a boolean index into a scalar always produce an array as a result? Only if that boolean is False? I should also point out that we have another difference between array scalars and zero-dimensional arrays: In [7]: rank0[np.array(True)].shape Out[7]: () In [8]: rank0[np.array([True,False])[0]].shape --- Traceback (most recent call last) /home/peridot/software/numpy/test/lib/python2.5/site-packages/ in () : 0-d arrays can't be indexed Zero-dimensional arrays can be used as indices, but apparently-equivalent scalars cannot. I am totally unclear on what the distinction between rank-zero arrays and scalars should be and what indexing into them should look like. Is there a document describing this (or are we just throwing an undocumented mass of confusing corner cases at users)? Thanks, Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] ma's stdu and varu
Hi, I was just going through tidying up the documentation for all the many functions in numpy that compute standard deviations or variances (the functions, the methods, the methods on matrices, the methods on maskedarrays, all needed their docstrings updated in approximately the same way). I noticed that maskedarrays seem to have the functions "stdu" and "varu", for computing the unbiased standard deviation and variance (where one divides by N-1 rather than N). These are now available, I think, via std and var with ddof=1. More seriously, they still provide the peculiar older definition of var, where varu([1,1.j,-1,-1.j])==0. I don't use maskedarrays, and in fact I haven't been keeping track of what's going on with the two different implementations. So: what should be done about stdu and varu? Do the two implementations both need their definitions of std and var examined? Thanks, Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] matrix multiply
> > and for negative powers some sort of floating-point > > inverse. > > That deserves discussion. > Not all "invertible" boolean matrices have an inverse in the algebra. > Just the orthogonal ones do. > > I guess I would special case inverses for Boolean matrices. > Just test if the matrix B is orthogonal (under Boolean > multiplication) and if so return B's transpose. This is actually a limitation of the whole linear algebra subsystem: we only support floating-point linear algebra (apart from dot()). There are algorithms for doing integer linear algebra, which might be interesting to implement. But that's a big job. Boolean matrices as you use them are another step again: because they are not a group under "+", almost all of linear algebra has to be reinterpreted. For example it's not obvious that matrix multiplication is associative; it turns out to be, because you can replace the matrices by integer matrices containing ones for True and zeros for False, then do the math, then interpret any nonzero integer as True, and zero as False. As an aside, if you use "+" to mean xor (which I am not suggesting!) all of linear algebra continues more or less unchanged; you're just working over the field of integers modulo 2. If you want eigenvalues you can pass to an algebraic closure. > > The inverse actually poses a problem: should it return > > (A**(-1))**n or (A**n)**(-1)? (No, they're not the same > > for boolean matrices.) > > I think it must be the latter ... ? > > By associativity, if B has an inverse A, > then B**n must have inverse A**n. > So you are observing that with boolean matrices > we might find B**n is invertible even though B is not. > Right? So the latter will work in more cases. > So again: form B**n, test for orthogonality, > and return the transpose if the test passes. Well, as it stands, since we have no integer linear algebra, inverses are always floating-point. When you do that, you find that, for example, ([[True,False],[True,True]]**(-1))**2 = [[1.,0.],[-2.,1.]] but ([[True,False],[True,True]]**2)**(-1) = [[1.,0.],[-1.,1.]] I am not aware of any algorithm for finding inverses, or even determining which matrices are invertible, in the peculiar Boolean arithmetic we use. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] matrix multiply
On 06/04/2008, Alan G Isaac <[EMAIL PROTECTED]> wrote: > Just checking: > it's important to me that this won't change > the behavior of boolean matrices, but I don't > see a test for this. E.g., :: > > >>> import numpy as N > >>> A = N.mat('1 0;1 1',dtype='bool') > >>> A**2 > matrix([[ True, False], > [ True, True]], dtype=bool) I have no desire to change the behaviour of boolean matrices, and I'll write a test, but what is it supposed to do with them? Just produce reduce(dot,[A]*n)? For zero it will give the identity, and for negative powers some sort of floating-point inverse. Currently for positive powers it should produce the right answer provided multiplication is associative (which I think it is). The inverse actually poses a problem: should it return (A**(-1))**n or (A**n)**(-1)? (No, they're not the same for boolean matrices.) Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] matrix power (was: matrix multiply)
On 05/04/2008, Stéfan van der Walt <[EMAIL PROTECTED]> wrote: > Some discussion recently took place around raising a square matrices > to integer powers. See ticket #601: > > http://scipy.org/scipy/numpy/ticket/601 > > Anne Archibald wrote a patch which factored 'matrix_multiply' out of > defmatrix (the matrix power implemented for the Matrix class). After > some further discussion on irc, and some careful footwork so that > everything imports correctly, I checked in > > http://projects.scipy.org/scipy/numpy/changeset/4968 > > The matrix_multiply method is exposed as numpy.linalg.matrix_multiply, > and is not in the top-level numpy namespace (it's a bit crowded there > already). I'd be glad if you would review the changeset and comment. Just to be clear, the function is called matrix_power, not matrix_multiply. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Final push for NumPy 1.0.5 (I need your help!)
On 05/04/2008, Charles R Harris <[EMAIL PROTECTED]> wrote: > On Sat, Apr 5, 2008 at 4:10 PM, Anne Archibald <[EMAIL PROTECTED]> > wrote: > > More generally, my local working copy is now rater divergent from the > > upstream. What's the recommended way to deal with this? Make sure I > > have all the patches submitted to trac, then just revert to raw SVN? > > So far I've just been editing the output of svn diff down to only the > > bit that's relevant to each patch, but it's getting pretty long. > > Do you have commit access now? You could pull a fresh copy of svn, apply > your patches one by one, and commit each time. That way you can avoid > conflicts, which can be a pain when your local copy has diverged a lot from > upstream. Lots of small commented changes are also preferable to one big > patch. No, or at least, I don't know a username/password to use. I also don't seem to be able to change the status of bugs on the trac. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Final push for NumPy 1.0.5 (I need your help!)
On 05/04/2008, James Philbin <[EMAIL PROTECTED]> wrote: > I've posted patches for: > > #630: If float('123.45') works, so should numpy.float32('123.45') > #581: random.set_state does not reset state of random.standard_normal Patches for #601, #622, #692, #696, #717 now in trac; I'd like to do something about #720 (variance of complex arrays needs docs and tests), but any patch for it is necessarily going to tread on the toes of my patch for #696 (ddof parameter to var needs tests). I'm not quite sure what the best way to handle this sort of thing is. More generally, my local working copy is now rater divergent from the upstream. What's the recommended way to deal with this? Make sure I have all the patches submitted to trac, then just revert to raw SVN? So far I've just been editing the output of svn diff down to only the bit that's relevant to each patch, but it's getting pretty long. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram
On 05/04/2008, Bruce Southey <[EMAIL PROTECTED]> wrote: > 1) Should the first bin contain all values less than or equal to the > value of the first limit and the last bin contain all values greater > than the value of the last limit? > This produced the counts as: array([3, 3, 9]) (I termed this > 'Accumulate' in the output). > > 2) Should any values outside than the range of the bins be excluded? > That is remove any value that is smaller than the lowest value of the > bin and higher than the highest value of the bin. > This produced the counts as: array([2, 3, 4]) (I termed this 'Exclude' > in the output) > > 3) Should there be extra bins for these values? > While I did not implement this option, it would provide the counts as: > array([1,2,3,4,5]) There's also a fourth option - raise an exception if any points are outside the range. I hope this is a question about defaults - really what I would most want is to have the choice, as a keyword option. For the default, I would be tempted to go with option 4, raising an exception. This seems pretty much guaranteed not to produce surprising results: if there's any question about what the results should be it produces an error, and the user can run it again with specific instructions. Plus in many contexts having any points that don't belong in any bin is the result of a programming error. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple financial functions for NumPy
On 04/04/2008, Alan G Isaac <[EMAIL PROTECTED]> wrote: > On Fri, 4 Apr 2008, Gael Varoquaux apparently wrote: > > I really thing numpy should be as thin as possible, so > > that you can really say that it is only an array > > manipulation package. This will also make it easier to > > sell as a core package for developpers who do not care > > about "calculator" features. > > > I'm a user rather than a developer, but I wonder: > is this true? > > 1. Even as a user, I agree that what I really want from > NumPy is a core array manipulation package (including > matrices). BUT as long as this is the core of NumPy, > will a developer care if other features are available? > > 2. Even if the answer to 1. is yes, could the > build/installation process include an option not to > build/install anything but the core array functionality? > > 3. It seems to me that pushing things out into SciPy remains > a problem: a basic NumPy is easy to build on any platform, > but SciPy still seems to generate many questions. > > 4. One reason this keeps coming up is that he NumPy/SciPy > split is rather too crude. If the split were instead > something like NumPy/SciPyBasic/SciPyFull/SciPyFull+Kits > where SciPyBasic contained only pure Python code (no > extensions), perhaps the desired location would be more > obvious and some of this recurrent discussion would go away. It seems to me that there are two separate issues people are talking about when they talk about packaging: * How should functions be arranged in the namespaces? numpy.foo(), scipy.foo(), numpy.lib.financial.foo(), scikits.foo(), numkitfull.foo()? * Which code should be distributed together? Should scipy require separate downloading and compilation from numpy? The two questions are not completely independent - it would be horribly confusing to have the set of functions available in a given namespace depend on which packages you had installed - but for the most part it's not a problem to have several toplevel namespaces in one package (python's library is the biggest example of this I know of). For the first question, there's definitely a question about how much should be done with namespaces and how much with documentation. The second is a different story. Personally, I would prefer if numpy and scipy were distributed together, preferably with matplotlib. Then anybody who used numpy would have available all the scpy tools and all the plotting tools; I think it would cut down on wheel reinvention and make application development easier. Teachers would not need to restrict themselves to using only functions built into numpy for fear that their students might not have scipy installed - how many students have learned to save their arrays in unportable binary formats because their teacher didn't want them to have to install scipy? I realize that this poses technical problems. For me installing scipy is just a matter of clicking on a checkbox and installing a 30 MB package, but I realize that some platforms make this much more difficult. If we can't just bundle the two, fine. But I think it is mad to consider subdividing further if we don't have to. I think python's success is due in part to its "batteries included" library. The fact that you can just write a short python script with no extra dependencies that can download files from the Web, parse XML, manage subprocesses, and save persistent objects makes development much faster than if you had to forever decide between adding dependencies and reinventing the wheel. I think numpy and scipy should follow the same principle, of coming "batteries included". So in this specific case, yes, I think the financial functions should absolutely be included; whether they should be included in scipy or numpy is less important to me because I think everyone should install both packages. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Final push for NumPy 1.0.5 (I need your help!)
On 04/04/2008, Travis E. Oliphant <[EMAIL PROTECTED]> wrote: > Hey Anne, > > Do you currently have SVN access? Would you like it? > > I think the SciPy/NumPy sprint would be a good time to clean-up the > committers list and add new people interested in helping. I don't have SVN access. I'd be happy (and careful!) to have it, but I should warn you that I won't have time to do serious, regular development on scipy/numpy; I do hope to be able to write a little code here and there, though, and it would be handy to be able to add it directly instead of sending patches into the ether. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Final push for NumPy 1.0.5 (I need your help!)
On 04/04/2008, Jarrod Millman <[EMAIL PROTECTED]> wrote: > Since I sent my email last night another 5+ tickets have been closed. > If we keep going at this rate, we should be able to release 1.0.5 next > Friday (4/11) with every ticket closed. Specifically, thanks to > Travis Oliphant, David Huard, and Stefan van der Walt for bug > squashing. > > If anyone has some time, could you please check David's fix for the > ticket "loadtxt fails with record arrays": > http://projects.scipy.org/scipy/numpy/ticket/623 > His fix looks correct to me (and even includes test!!); if someone > else can confirm this, David or I will be able to close this ticket. > > There are now only 21 remaining open tickets. Please take a look over > > the open tickets and see if there is anything you can quickly close: > > http://scipy.org/scipy/numpy/query?status=new&status=assigned&status=reopened&milestone=1.0.5 I've submitted patches for two (matrix_power and condition number functions) but I don't think I can actually manipulate the bug status in any way. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [SciPy-user] conforming to Python GIL...
On 03/04/2008, Travis E. Oliphant <[EMAIL PROTECTED]> wrote: > fred wrote: > > Hi, > > > > I use a lot of ConVeX OPTimsation and fortran (via f2py) routines in my > > Traits app. > > > > As I want to compute the data and want to display them, I use threads. > > > > The issue I get is that data displayed (using Chaco2) are not updated > > (app is frozen) while computing the input data. > > > > From D. Morrill's answer (the Traits guru ;-)), it appears that cvxopt > > (and solve() from scipy, in fact) and fortran modules does not release > > the "Python GIL (Global Intepreter Lock)". > > > > This requires a bit of effort to solve. We need to in multiple places... > > 1) release the GIL > 2) put semaphores into code that is not thread-safe. > > f2py should be modified to handle this.Other could should be > modified to handle it as well. > > I suspect NumPy should grow an API to handle the semaphore thing easily > (I think Python already has one), so these may be able to be wrapped > into the MACROS already available for releasing the GIL. What is the story on f2py's releasing of the GIL? I had understood that wrapper code was able to declare the wrapped code to be thread-safe, so that the GIL was released while it executed. Is the problem here that cvxopt is not threadsafe? I guess the best solution then is to make a cvxopt- (and related routines)-specific lock? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)
On 23/03/2008, David Cournapeau <[EMAIL PROTECTED]> wrote: > Gnata Xavier wrote: > > > > Hi, > > > > I have a very limited knowledge of openmp but please consider this > > testcase : > > > > > > > Honestly, if it was that simple, it would already have been done for a > long time. The problem is that your test-case is not even remotely close > to how things have to be done in numpy. Actually, there are a few places where a parallel for would serve to accelerate all ufuncs. There are build issues, yes, though they are mild; we would also want to provide some API to turn parallelization on and off, and we'd have to verify that OpenMP did not slow down small arrays, but that would be it. (And I suspect that OpenMP is smart enough to use single threads without locking when multiple threads won't help. Certainly all the information is available to OpenMP to make such decisions.) This is why I suggested making this change: it should be a low-cost, high-gain change. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to set array values based on a condition?
On 23/03/2008, Damian Eads <[EMAIL PROTECTED]> wrote: > Hi, > > I am working on a memory-intensive experiment with very large arrays so > I must be careful when allocating memory. Numpy already supports a > number of in-place operations (+=, *=) making the task much more > manageable. However, it is not obvious to me out I set values based on a > very simple condition. > > The expression > >y[y<0]=-1 > > generates a binary index mask y>=0 of the same size as the array y, > which is problematic when y is quite large. > > I was wondering if there was anything like a set_where(A, cmp, B, > setval, [optional elseval]) function where cmp would be a comparison > operator expressed as a string. > > The code below illustrates what I want to do. Admittedly, it needs to be > cleaned up but it's a proof of concept. Does numpy provide any functions > that support the functionality of the code below? That's a good question, but I'm pretty sure it doesn't, apart from numpy.clip(). The way I'd try to solve that problem would be with the dreaded for loop. Don't iterate over single elements, but if you have a gargantuan array, working in chunks of ten thousand (or whatever) won't have too much overhead: block = 10 for n in arange(0,len(y),block): yc = y[n:n+block] yc[yc<0] = -1 It's a bit of a pain, but working with arrays that nearly fill RAM *is* a pain, as I'm sure you are all too aware by now. You might look into numexpr, this is the sort of thing it does (though I've never used it and can't say whether it can do this). Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)
On 22/03/2008, Travis E. Oliphant <[EMAIL PROTECTED]> wrote: > James Philbin wrote: > > Personally, I think that the time would be better spent optimizing > > routines for single-threaded code and relying on BLAS and LAPACK > > libraries to use multiple cores for more complex calculations. In > > particular, doing some basic loop unrolling and SSE versions of the > > ufuncs would be beneficial. I have some experience writing SSE code > > using intrinsics and would be happy to give it a shot if people tell > > me what functions I should focus on. > > Fabulous! This is on my Project List of todo items for NumPy. See > http://projects.scipy.org/scipy/numpy/wiki/ProjectIdeas I should spend > some time refactoring the ufunc loops so that the templating does not > get in the way of doing this on a case by case basis. > > 1) You should focus on the math operations: add, subtract, multiply, > divide, and so forth. > 2) Then for "combined operations" we should expose the functionality at > a high-level. So, that somebody could write code to take advantage of it. > > It would be easiest to use intrinsics which would then work for AMD, > Intel, on multiple compilers. I think even heavier use of code generation would be a good idea here. There are so many different versions of each loop, and the fastest way to run each one is going to be different for different versions and different platforms, that a routine that assembled the code from chunks and picked the fastest combination for each instance might make a big difference - this is roughly what FFTW and ATLAS do. There are also some optimizations to be made at a higher level that might give these optimizations more traction. For example: A = randn(100*100) A.shape = (100,100) A*A There's no reason the multiply ufunc couldn't flatten A and use a single unstrided loop to do the multiplication. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)
On 22/03/2008, Thomas Grill <[EMAIL PROTECTED]> wrote: > I've experimented with branching the ufuncs into different constant > strides and aligned/unaligned cases to be able to use SSE using > compiler intrinsics. > I expected a considerable gain as i was using float32 with stride 1 > most of the time. > However, profiling revealed that hardly anything was gained because of > 1) non-alignment of the vectors this _could_ be handled by > shuffled loading of the values though > 2) the fact that my application used relatively large vectors that > wouldn't fit into the CPU cache, hence the memory transfer slowed down > the CPU. > > I found the latter to be a real showstopper for most of my experiments > with SIMD. It's especially a problem for numpy because smaller vectors > have a lot of Python/numpy overhead, and larger ones don't really > benefit due to cache exhaustion. This particular issue can sometimes be reduced by clever use of the prefetching intrinsics. I'm not totally sure it's going to help inside most ufuncs, though, since the runtime is so dominated by memory reads. In a program I was writing I had time to do a 128-point real FFT in the time it took to load the next 64 floats... Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy's future (1.1 and beyond): which direction(s) ?
On 21/03/2008, Gnata Xavier <[EMAIL PROTECTED]> wrote: > Something like http://idlastro.gsfc.nasa.gov/idl_html_help/TOTAL.html > (Thread Pool Keywords) would be nice. > A "total like" function could be a great pathfinder a put threads into > numpy keeping the things as simple as they should remain. > Not sure we need that is numpy in 1.1 but IMHO we need that in a near > future (because every "array oriented" libs are now threaded). There was some discussion of this recently. The most direct approach to the problem is to annotate some or all of numpy's inner C loops with OpenMP constructs, then provide some python functions to control the degree of parallelism OpenMP uses. This would transparently provide parallelism for many numpy operations, including sum(), numpy's version of IDL's total(). All that is needed is for someone to implement it. Nobody has stepped forward yet. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Improving Docs on Wiki
On 21/03/2008, Stéfan van der Walt <[EMAIL PROTECTED]> wrote: > On Fri, Mar 21, 2008 at 2:47 PM, Anne Archibald > <[EMAIL PROTECTED]> wrote: > > Is it perhaps possible to make all numpy functions accessible in > > submodules (in addition to in numpy, for backwards compatibility) and > > then promote accessing them that way? Are they already? If so how do I > > find out what the submodules are? > > We should definately discuss and consider this proposal for 1.1. Do > you have a suggested organisation in mind? Not exactly. What do people think of the way I organized the numpy functions by category page? Apart from the sore-thumb "other" category, it does seem like the kind of grouping we might hope for. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Improving Docs on Wiki
On 21/03/2008, Sebastian Haase <[EMAIL PROTECTED]> wrote: > Comment: I have read the module- or directory-name "core" many times > on this list, however: Who really knows where a given functions > belongs ? Isn't that mostly only the numpy svn commiters ? > In other words, using only the python side of numpy, someone (like > myself) would NOT know that sort is inside "core" ! > > Also: since >>> import numpy as N; N.sort refers already to that same sort: > >>> N.core.sort > > >>> N.sort > > > I would prefer not to require "core" sub-sub-page. > Instead, every name that is accessible as N. should be > documented without extra sub-page. I don't have a solution, but I would like to complain about numpy's flat namespace. Perhaps we're stuck with it now, but it's very difficult to find the right function. In scipy, I can find the right numerical integration by importsing scipy.integrate and using tab completion, But in numpy, everything is loaded into the base namespace, so tab completion gets me an overwhelming 502 possibilities. That's why there's a "numpy functions by category" but no "scipy functions by category" - scipy functions are already by category. Is it perhaps possible to make all numpy functions accessible in submodules (in addition to in numpy, for backwards compatibility) and then promote accessing them that way? Are they already? If so how do I find out what the submodules are? Thanks, Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Inplace index suprise
On 20/03/2008, Gael Varoquaux <[EMAIL PROTECTED]> wrote: > On Thu, Mar 20, 2008 at 06:17:44PM +, James Philbin wrote: > > Hi, > > > > This cannot work, because the inplace operation does not > > > take place as a for loop. > > Well, this would be fine if I was assigning the values to tempories as > > you suggest. However, the operation should be performed inplace and > > this is what I don't understand - why is there no for loop? I think > > the semantics of these inplace indexed operations is intuitively quite > > clear and numpy doesn't follow this intuition. Would there be any > > interest in changing this behaviour in numpy? > > > I think this is technicaly impossible from the way numpy works. This > breaks the numpy model that every operation is global on the array. It is quite reasonable from a least-surprise point of view. Unfortunately it can't really be done because of the way python implements augmented assignments. If I'm not mistaken, numpy's histogram function can be used to accomplish this particular thing. > > > This is a FAQ. > > Sorry if i'm rehashing old ground, but the closest thing I could find > > to a numpy faq is here: http://www.scipy.org/FAQ. There seems to be no > > mention of this issue there. > > > Sorry if I came out harsh, I certainly didn't want to implie that you > were expecting something wrong, just that this thing was tricking people. I added it to that FAQ, along with the explanation I got when I asked the same question: http://www.scipy.org/FAQ#head-1ed851e9aff803d41d3cded8657b2b15a888ebd5 Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to build a series of arrays as I go?
On 17/03/2008, Alan G Isaac <[EMAIL PROTECTED]> wrote: > > Alan suggested: > > >> 1. http://www.scipy.org/Numpy_Example_List_With_Doc > > On Mon, 17 Mar 2008, Chris Withers apparently wrote: > > > Yeah, read that, wood, trees, can't tell the... > > Oh, then you might want > http://www.scipy.org/Tentative_NumPy_Tutorial > or the other stuff at > http://www.scipy.org/Documentation > All in all, I've found the resources quite good. Also, for the specific question of "how do I do X?" you can try http://www.scipy.org/Numpy_Functions_by_Category Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy and OpenMP
On 15/03/2008, Damian Eads <[EMAIL PROTECTED]> wrote: > Robert Kern wrote: > > Eric Jones tried to use multithreading to split the computation of > > ufuncs across CPUs. Ultimately, the overhead of locking and unlocking > > made it prohibitive for medium-sized arrays and only somewhat > > disappointing improvements in performance for quite large arrays. I'm > > not familiar enough with OpenMP to determine if this result would be > > applicable to it. If you would like to try, we can certainly give you > > pointers as to where to start. > > Perhaps I'm missing something. How is locking and synchronization an > issue when each thread is writing to a mutually exclusive part of the > output buffer? The trick is to efficiently allocate these output buffers. If you simply give each thread 1/n th of the job, if one CPU is otherwise occupied it doubles your computation time. If you break the job into many pieces and let threads grab them, you need to worry about locking to keep two threads from grabbing the same piece of data. Plus, depending on where things are in memory you can kill performance by abusing the caches (maintaining cache consistency across CPUs can be a challenge). Plus a certain amount of numpy code depends on order of evaluation: a[:-1] = 2*a[1:] Correctly handling all this can take a lot of overhead, and require a lot of knowledge about hardware. OpenMP tries to take care of some of this in a way that's easy on the programmer. To answer the OP's question, there is a relatively small number of C inner loops that could be marked up with OpenMP #pragmas to cover most matrix operations. Matrix linear algebra is a separate question, since numpy/scipy prefers to use optimized third-party libraries - in these cases one would need to use parallel linear algebra libraries (which do exist, I think, and are plug-compatible). So parallelizing numpy is probably feasible, and probably not too difficult, and would be valuable. The biggest catch, I think, would be compilation issues - is it possible to link an OpenMP-compiled shared library into a normal executable? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] dimensions too large error
On 14/03/2008, Dinesh B Vadhia <[EMAIL PROTECTED]> wrote: > For the following code: > > I = 18000 > J = 33000 > filename = 'ij.txt' > A = scipy.asmatrix(numpy.empty((I,J), dtype=numpy.int)) > for line in open(filename, 'r'): > etc. > > The following message appears: > > Traceback (most recent call last): > File "C:\...\py", line 362, in > A= scipy.asmatrix(numpy.empty((I,J), dtype=numpy.int)) > ValueError: dimensions too large. > > Is there a limit to array/matrix dimension sizes? Yes. On 32-bit machines the hardware makes it exceedingly difficult for one process to access more than two or three gigabytes of RAM, so numpy's strides and sizes are all 32-bit integers. As a result you can't make arrays bigger than about 2 GB. If you need this I'm afraid you pretty much need a 64-bit machine. > Btw, for numpy array's, ascontiguousarray() is available to set aside > contiguous memory. Is there an equivalent for scipy matrix ie. an > ascontiguousmatrix()? That's not exactly what ascontiguousarray() is for. Normally if you create a fresh numpy array it will be allocated as a single contiguous block of memory, and the strides will be arranged in that special way that numpy calls "contiguous". However, if you take a subarray of that array - every second element, say - the resulting data is not contiguous (in the sense that there are gaps between the elements). Normally this is not a problem. But a few functions - mostly handwritten C or FORTRAN code - needs contiguous arrays. The function ascontiguousarray() will return the original array if it is contiguous (and in C order), or make a copy if it isn't. Numpy matrices are just a slight redefinition of the operators on an array, so you can always convert an array to a matrix without copying. Thus there's little cost (for large arrays) to just using matrix(ascontiguousarray()) if you need matrices. Good luck, Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Array assignment problem
On 11/03/2008, Dinesh B Vadhia <[EMAIL PROTECTED]> wrote: > Hello! I'm reading a text file with two numbers in str format on each line. > The numbers are converted into integers. Each integer is then assigned to > a 2-dimensional array ij (see code below). The problem is that neither of > the array assignments work ie. both ij[index, 0] = r and ij[index, 1] = c > are always 0 (zero). I've checked r and c and both are integers (>=0). > > import sys > import os > import numpy > > nnz = 120 > ij = numpy.array(numpy.empty((nnz, 2), dtype=int)) > index = 0 > filename = 'test_ij.txt' > for line in open(filename, 'r'): > line = line.rstrip('\n') > r, c = map(str, line.split(',')) > r = int(r) > c = int(c) > ij[index, 0] = r > ij[index, 1] = c > index = index + 1 > > > What am I doing wrong? The first thing you're doing wrong is you're not using numpy.loadtxt: In [35]: numpy.loadtxt('foo',delimiter=",",dtype=numpy.int) Out[35]: array([[1, 3], [4, 5], [6, 6]]) This removes the need for the rest of your code. To find useful functions like this in future, you can try looking at http://www.scipy.org/Numpy_Functions_by_Category Stripping the newline off is unnecessary, since int(" 17 \n")==17. Also, since the results of line.split(,) are already strings, the map(str, ...) doesn't do anything. Did you mean it to? Otherwise, your code works fine for me. I should point out that using empty(), you should expect your array to be full of gibberish (rather than 0), so if you're seeing lots of zeros, they're probably coming from the text file. Good luck, Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Slice and assign into new NDarray...
On 08/03/2008, Vince Fulco <[EMAIL PROTECTED]> wrote: > I have an ND array with shape (10,15) and want to slice or subset(?) the data > into a new 2D array with the following criteria: > > 1) Separate each 5 observations along axis=0 (row) and transpose them to > the new array with shape (50,3) > > > Col1 Co2 Col3 > > Slice1 Slice2 Slice3 > ... > ... > ... > > Slice1 should have the coordinates[0:5,0], Slice2[0:5,1] and so > on...I've tried initializing the target ND array with > > D = NP.zeros((50,3), dtype='int') and then assigning into it with > something like: > > for s in xrange(original_array.shape[0]): > D= NP.transpose([data[s,i:i+step] for i in range(0,data.shape[1], > step)]) > > with step = 5 but I get errors i.e. IndexError: invalid index > > Also tried various combos of explicitly referencing D coordinates but > to no avail. You're not going to get a slice - in the sense of a view on the same underlying array, and through which you can modify the original array - but this is perfectly possible without for loops. First set up the array: In [12]: a = N.arange(150) In [13]: a = N.reshape(a, (-1,15)) You can check that the values are sensible. Now reshape it so that you can split up slice1, slice2, and slice3: In [14]: b = N.reshape(a, (-1, 3, 5)) slice1 is b[:,0,:]. Now we want to flatten the first and third coordinates together. reshape() doesn't do that, exactly, but if we swap the axes around we can use reshape to put them together: In [15]: c = N.reshape(b.swapaxes(1,2),(-1,3)) This reshape necessarily involves copying the original array. You can check that it gives you the value you want. I recommend reading http://www.scipy.org/Numpy_Functions_by_Category for all those times you know what you want to do but can't find the function to make numpy do it. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argmin & min on ndarrays
On 04/03/2008, Pierre GM <[EMAIL PROTECTED]> wrote: > Anne, > > Thanks a lot for your suggestion. Something like > > >>>if axis is None: > >>>return b.flat[a.argmin()] > >>>else: > >>> return numpy.choose(a.argmin(axis),numpy.rollaxis(b,axis,0)) > > seems to do the trick fairly nicely indeed. The other solutions you suggested > would require too much ad hoc adaptation. > Thanks again ! Ah! "It ain't the things you don't know that'll get you, it's the things you know that ain't so." I thought rollaxis rolled the axes around cyclically. This is much more useful, but what a funny name for what it actually does... I should have provided the link before, but this is very useful for answering this kind of question: http://www.scipy.org/Numpy_Functions_by_Category Good luck, Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argmin & min on ndarrays
On 04/03/2008, Pierre GM <[EMAIL PROTECTED]> wrote: > All, > Let a & b be two ndarrays of the same shape. I'm trying to find the elements > of b that correspond to the minima of a along an arbitrary axis. > The problem is trivial when axis=None or when a.ndim=2, but I'm getting > confused with higher dimensions: I came to the following solution that looks > rather ugly, and I'd need some ideas to simplify it > > >>>a=numpy.arange(24).reshape(2,3,4) > >>>axis=-1 > >>>b = numpy.rollaxis(a,axis,0)[a.argmin(axis)][tuple([0]*(a.ndim-1))] > >>>numpy.all(b, a.min(axis)) > True > > Thanks a lot in advance for any suggestions. I couldn't find any nice way to make indexing do what you want, but the function choose() can be persuaded to do it. Unfortunately it will only choose along the first axis, so some transpose jiggery-pokery is necessary: def pick_argmin(a,b,axis): assert a.shape == b.shape t = range(len(b.shape)) i = t[axis] del t[axis] t = [i] + t a = a.transpose(t) b = b.transpose(t) return N.choose(N.argmin(a,axis=0),b) I did find a not-nice way to do what you want. The problem is that numpy's fancy indexing is so general, it won't let you simply pick and choose along one axis, you have to pick and choose along all axes. So what you do is use indices() to generate arrays that index all the *other* axes appropriately, and then use the argmin array to index the axis you're interested in: In [39]: c = N.indices((2,4)) In [40]: b[c[0],N.argmin(a,axis=1),c[1]] Out[40]: array([[-0.70659942, -0.997249 , -0.20028296, -0.05171191], [-1.28886394, -1.0610526 , -1.07193295, 0.05356948]]) In [42]: c[0] Out[42]: array([[0, 0, 0, 0], [1, 1, 1, 1]]) In [43]: c[1] Out[43]: array([[0, 1, 2, 3], [0, 1, 2, 3]]) Not only would this require similar jiggery-pokery, it creates the potentially very large intermediate array c. I'd stick with choose(). A third option would be to transpose() and reshape() a and b down to two dimensions, then reshape() the result back to the right shape. More multiaxis jiggery-pokery, and the reshape()s may end up copying the arrays. Finally, you can always just write a python loop (over all axes except the one of interest) using ndenumerate() and one-dimensional argmin(). If the dimension you're argmin()ing over is very large, the cost of the python loop may be negligible. Anne P.S. feel free to use pick_argmin however you like, though error handling would probably be a good idea... -A ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Pickling and initializing
On 03/03/2008, Dinesh B Vadhia <[EMAIL PROTECTED]> wrote: > When you pickle a numpy/scipy matrix does it have to be initialized by > another program? For example: Most python objects do not need to be initialized. You just call a function that makes the one you want: >>> l = range(10) This makes a list of length 10. You can now manipulate it, adding elements or what have you. Arrays are no different. (You use matrices in your example - the only difference is that the multiplication operator behaves differently, and you often need to use asmatrix to convert them back to arrays. I never use them, even when I have to do some linear algebra.) You simply call a function that makes the array you want: >>> a = arange(10) You can change its contents, but it's not really sensible to say that arrays must be initialized before use. The function empty() is kind of a peculiar aberration - it's for those rare cases when you end up reassigning all the values in the array, and zeros() is too slow. (For debugging it'd be nice to have NaNs()...) Perhaps you are thinking of statically-typed languages, where variables must be initialized? In python variables do not have type, so variables holding arrays are no different from variables holding strings, integers, file objects, or whatever. Using a python variable before it has been assigned a value does indeed raise an exception; all that is needed is to assign a value to it. Unpickling reads a file and constructs a "new" array from the data in that file. The array value is returned; one often assigns this value to a variable. The values in the array are filled in by the pickling function. It is not possible to make the unpickler store its data in a preallocated array. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.correlate with phase offset 1D data series
On 03/03/2008, Ray Schumacher <[EMAIL PROTECTED]> wrote: > Xie's 2D algorithm reduced to 1D works nicely for computing the > relative phase, but is it the fastest way? It might be, since some > correlation algorithms use FFTs as well. What does _correlateND use, in > scipy? Which way will be the fastest really depends what you want to do. Algorithmically, the direct way numpy.correlate operates is O(NM), and the way FFT-based algorithms operate is (roughly) O((N+M)log(N+M)) (or for a more sophisticated algorithm O(N log M) where M is less than N). In practice what this means is that when one or both of the things you're correlating is short (tens of samples or so), you should use a direct method; when one or both are long you should use an FFT-based method. (There are other approaches too, but I don't know of any in wide use.) In your case it sounds like you have two signals of equal fairly large length to compare. Some questions remain, though: * What do you want to happen at the endpoints? Without padding, only a small interval (the difference in lengths plus one) is valid. Zero-padding works, but guarantees a fall-off at the ends. Circular correlation is easy to implement but not appropriate most of the time. * Do you care about sub-sample alignment? How much accuracy do you really need? Direct methods really can't give you this information. With Fourier methods, you can easily pad the spectrum with zeros and inverse FFT, giving you a beautifully-interpolated signal. If you want more accuracy, a quadratic fit to the three points around the peak of the interpolated signal will get you very close. If you need more accuracy, you can use numerical maximization, evaluating each point as sum(a_k exp(2 pi i k x)). The other common application is to have a template (that presumably falls to zero at its endpoint) and to want to compute a running correlation against a stream of data. This too can be done both ways, depending on the size of the template; all that is needed is to think carefully about overlaps. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.correlate with phase offset 1D data series
On 03/03/2008, Ray Schumacher <[EMAIL PROTECTED]> wrote: > > I'm trying to figure out what numpy.correlate does, and, what are people > using to calculate the phase shift of 1D signals? I use a hand-rolled Fourier-domain cross-correlation, but then, I'm using a Fourier-domain representation of my signals. > (I coded on routine that uses rfft, conjugate, ratio, irfft, and argmax > based on a paper by Hongjie Xie "An IDL/ENVI implementation of the FFT Based > Algorithm for Automatic Image Registration" - but that seems more intensive > than it could be.) Sounds familiar. If you have a good signal-to-noise ratio, you can get subpixel accuracy by oversampling the irfft, or better but slower, by using numerical optimization to refine the peak you found with argmax. > In numpy, an identity import numpy from pylab import * > l=[1,5,3,8,15,6,7,7,9,10,4] c=numpy.correlate(l,l, mode='same') plot(c) > peaks at the center, x=5, and is symmetric You have revealed several flaws in numpy's correlate. First of all, the docstring gives no indication of how to interpret the result: neither the zero-shift position nor the direction of the result is at all clear (if I shift the first vector to the left, does the correlation peak shift left or right?). Second, the mode "same" gives results which are rather difficult to understand. Third, there is no way to get a "circular" correlation. I would be inclined to use convolve (or scipy.ndimage.convolve, which uses a Fourier-domain method), since it is somewhat better specified. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] contiguous true
On 01/03/2008, Charles R Harris <[EMAIL PROTECTED]> wrote: > > On Fri, Feb 29, 2008 at 10:53 AM, John Hunter <[EMAIL PROTECTED]> wrote: > > > I have a boolean array and would like to find the lowest index "ind" > > > where N contiguous elements are all True. Eg, if x is [...] > Oops, ind = arange(len(x)). I suppose nonzero would work as well. I'm guessing you're alluding to the fact that diff(nonzero(x)) gives you a list of the run lengths of Falses in x (except possibly for the first one). If you have a fondness for the baroque, you can try numpy.where(numpy.convolve(x,[1,]*N,'valid')==N) For large N this can even use Fourier-domain convolution (though you'd then have to be careful about round-off error). Silly, really, it's O(NM) or O(N log M) instead of O(N). Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Optimize speed of for loop using numpy
On 25/02/2008, Trond Kristiansen <[EMAIL PROTECTED]> wrote: > I have attached the function that the FOR loop is part of as a python file. > What I am trying to do is to create a set of functions that will read the > output files (NetCDF) from running the ROMS model (ocean model). The output > file is organized in xi (x-direction), eta (y-direction), and s > (z-direction) where the s-values are vertical layers and not depth. This > particular function (z_slice) will find the closest upper and lower s-layer > for a given depth in meters (e.g. -100). Then values from the two selcted > layers will be interpolated to create a new layer at the selected depth > (-100). The problem is that the s-layers follow the bathymetry and a > particular s-layer will therefore sometimes be above and sometimes below the > selected depth that we want to interpolate to. That's why I need a quick > script that searches all of the layers and find the upper and lower layers > for a given depth value (which is negative). The z_r is a 3D array > (s,eta,xi) that is created using the netcdf module. If I understand what you're doing correctly, you have a 3D array of depths, indexed by two unproblematic coordinates (eta and xi), and by a third coordinate whose values are peculiar. For each pair (eta, xi), you want to find the value of the third coordinate that occurs at a depth closest to some given depth. Are you looking to do this for many values of depth? It seems like what you're trying to do is (approximately) invert a function - you have z as a function of s, and you want s as a function of z. Is the mapping between depths and s values monotonic? First of all, numpy.searchsorted() is the right tool for finding where depth occurs in a list of z values. It gives you the position of the next lower value; it's pretty straightforward to write down a linear interpolation (you should be able to do it without any for loop at all) and more sophisticated interpolation schemes may be straightforward as well. If you want a smoother interpolation scheme, you may want to look at scipy.interpolate. It is not always as vectorial as you might wish, but it implements splines (optionally with smoothing) quite efficiently. I believe scipy.ndimage may well have some suitable interpolation code as well. In my opinion, the value of numpy/scipy comes from the tools that allow you to express major parts of your algorithm as a line or two of code. But it is often difficult to take advantage of this by "trying to accelerate for loops". I find it really pays to step back and look at my algorithm and think how to express it as uniform operations on arrays. (Of course it helps to have a rough idea of what operations are available; the numpy documentation is sometimes sadly lacking in terms of letting you know what tools are there.) You might find useful the new http://www.scipy.org/Numpy_Functions_by_Category Good luck, Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy 1:1.0.4: numpy.average() returns the wrong result with weights
On 22/02/2008, Travis E. Oliphant <[EMAIL PROTECTED]> wrote: > Is there a ticket on the NumPy trac for this? We won't see it if there > isn't. Thanks for pointing us to the bug. It appears to be fixed in SVN (that was quick!). But the Debian bug report also points out a peculiar unnecessary use of eval; the code is also slower and uses more memory than it has to. Attached is a patch to cure that. Anne Index: numpy/lib/function_base.py === --- numpy/lib/function_base.py (revision 4819) +++ numpy/lib/function_base.py (working copy) @@ -378,9 +378,9 @@ ni = len(ash) r = [newaxis]*ni r[axis] = slice(None, None, 1) -w1 = eval("w["+repr(tuple(r))+"]*ones(ash, float)") -n = add.reduce(a*w1, axis) -d = add.reduce(w1, axis) +r = tuple(r) +n = add.reduce(a*w[r], axis) +d = add.reduce(w, axis) else: raise ValueError, 'averaging weights have wrong shape' ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Matching 0-d arrays and NumPy scalars
On 21/02/2008, Stefan van der Walt <[EMAIL PROTECTED]> wrote: > Could I ask that we also consider implementing len() for 0-d arrays? > numpy.asarray returns those as-is, and I would like to be able to > handle them just as I do any other 1-dimensional array. I don't know > if a length of 1 would be valid, given a shape of (), but there must > be some consistent way of handling them. Well, if the length of an array is the product of all its sizes, the product of no things is customarily defined to be one... whether that is actually a useful value is another question. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proxy array class and units
On 13/02/2008, Dan Goodman <[EMAIL PROTECTED]> wrote: > Background: I'm writing a package to run simulations which make extensive use > of > linear algebra, for which I'm using numpy. However - it is important to my > package that quantities can have dimesions, so I've written a class Quantity > subclassed from float which includes the physical dimensions and raises > exceptions etc. if you try to add volts to kilograms or whatever. This seems > to > work fine, but I also wanted arrays containing physical dimensions. I wrote a > class qarray subclassed from numpy.ndarray which mostly just copies the > functionality of an array with dtype=object. Here is the issue. The code for > my > package internally doesn't use these qarray objects, it uses numpy arrays > because it would obviously be terribly slow to be repeatedly checking if the > dimensions were consistent all the time. However, I want to present a > (somewhat) > consistent strict unit-checking front to the user of this package. As one part > of this, I would like a class that appears to act exactly like a qarray (which > itself is supposed to act very much like a numpy array) but maintains a > connection with its numpy array counterpart in my package. I'm not sure I understand all the different layers you're talking about here. I think that I would attack your problem in the following way: Use ScientificPython's PhysicalQuantity (alone and in object arrays) or "unum"s to represent quantities with units: http://books.google.com/books?id=3nR75KSvsq4C&pg=PA169&lpg=PA169&dq=numpy+physicalquantity&source=web&ots=_cmXEC0qpx&sig=JBEz9BlegnIQJor-1BwnAdRTKLE Object arrays are fairly horrible, but having one unit per array shouldn't be very inefficient. It could cause some headaches, since it's sometimes useful to have an array with mixed units (differential equation solvers often require this when you convert a higher-order problem to first-order, for example), but it should be quite efficient computationally. So I would pull together an array that held a collection of quantities all in the same units. Type checking then becomes a once-per-array operation, which is relatively cheap. The way I'd implement this is (by preference) by grabbing a version off the net, or (if that fails) as a subclass of ndarray. If I want to do some heavy-duty array mangling without type checking getting in my way, I'll just convert to an ndarray (either normalizing the units or saving them), do the heavy lifting, and reapply the units afterwards. I don't really see why you're talking about proxy classes... Perhaps your problem is actually two problems: implementing unit arrays, and doing something I haven't understood. Is there any reason not to separate the two, and use one of the existing solutions for unit arrays? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] sort method raises unexpected error with axis=None
On 12/02/2008, Anne Archibald <[EMAIL PROTECTED]> wrote: > An efficient way to handle in-place (or out-of-place, come to think of > it) median along multiple axes is actually to take medians along all > axes in succession. That saves you some sorting effort, and some > programming effort, and doesn't require in-place multidimensional > sorting: Aargh. Sorry. No, that doesn't work: In [28]: all_axes_median(N.reshape([1,5,6,7],(2,2))) Out[28]: 4.75 Oops. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] sort method raises unexpected error with axis=None
On 12/02/2008, Matthew Brett <[EMAIL PROTECTED]> wrote: > Is it possible, in fact, to do an inplace sort on an array with > axis=None (ie flat sort)? It is, sometimes; just make an array object to point to the flattened version and sort that: In [16]: b = a[:] In [17]: b.shape = (16,) In [18]: b.sort() This is not always possible, depending on the arrangement of a in memory. An efficient way to handle in-place (or out-of-place, come to think of it) median along multiple axes is actually to take medians along all axes in succession. That saves you some sorting effort, and some programming effort, and doesn't require in-place multidimensional sorting: In [24]: def all_axes_median(a): : if len(a.shape)>1: : return all_axes_median(N.median(a)) : else: : return N.median(a) : : In [26]: all_axes_median(N.reshape(N.arange(32),(2,4,2,-1))) Out[26]: 15.5 Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Median advice
On 12/02/2008, Matthew Brett <[EMAIL PROTECTED]> wrote: > Suggestion 1: > def median(a, axis=0, out=None) [...] > Suggestion 2: > def median(a, axis=0, scratch_input=False) No reason not to combine the two. It's a pretty straightforward modification to do the sorting in place, and it could make a lot more difference to the runtime (and memory usage) than an output array. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Setting contents of buffer for array object
On 11/02/2008, Matthew Brett <[EMAIL PROTECTED]> wrote: > > I can also see that this could possibly be improved by using a for > > loop to iterate over the output elements, so that there was no need to > > duplicate the large input array, or perhaps a "blocked" iteration that > > duplicated arrays of modest size would be better. But how can a single > > float per data set whose median is being taken help? > > Sorry, you are right to call me on this very sloppy late-night > phrasing - I only meant that it would be useful in due course to use a > C implementation for median such as the ones you're describing, and > that this could write the result directly into the in-place memory - > in the same way that mean() does. It's quite true that it's difficult > to imagine the algorithm itself benefiting from the memory buffer. My point was not to catch you in an error - goodness knows I make enough of those, and not only late at night! - but to point out that there may not really be much need for an output argument. Even with a C code, for the median to be of much use, the output array can be at most half the size of the input array. The extra storage space required is not that big a concern, unlike a ufunc, and including an output argument forces you to deal with all sorts of data conversion issues. On the other hand, there is something to be said for allowing the code to destroy the input array. Perhaps *that* should be an optional argument (defaulting to zero)? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Setting contents of buffer for array object
On 10/02/2008, Matthew Brett <[EMAIL PROTECTED]> wrote: > > Ah, I see. You definitely do not want to reassign the .data buffer in > > this case. An out= parameter does not reassign the memory location > > that the array object points to. It should use the allocated memory > > that was already there. It shouldn't "copy" anything at all; > > otherwise, "median(x, out=out)" is no better than "out[:] = > > median(x)". Personally, I don't think that a function should expose an > > out= parameter unless if it can make good on that promise of memory > > efficency. > > I agree - but there are more efficient median algorithms out there > which can make use of the memory efficiently. I wanted to establish > the call signature to allow that. I don't feel strongly about it > though. This is a startling claim! Are there really median algorithms that are faster for having the use of a single float as storage space? If it were permissible to mutilate the original array in-place, I can certainly see a good median algorithm (based on quicksort, perhaps) being faster, but modifying the input array is a different question from using an output array. I can also see that this could possibly be improved by using a for loop to iterate over the output elements, so that there was no need to duplicate the large input array, or perhaps a "blocked" iteration that duplicated arrays of modest size would be better. But how can a single float per data set whose median is being taken help? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in numpy all() function
On 06/02/2008, Robert Kern <[EMAIL PROTECTED]> wrote: > > I guess the all function doesn't know about generators? > > Yup. It works on arrays and things it can turn into arrays by calling the C > API > equivalent of numpy.asarray(). There's a ton of magic and special cases in > asarray() in order to interpret nested Python sequences as arrays. That magic > works fairly well when we have sequences with known lengths; it fails utterly > when given an arbitrary iterator of unknown length. So we punt. Unfortunately, > what happens then is that asarray() sees an object that it can't interpret as > a > sequence to turn into a real array, so it makes a rank-0 array with the > iterator > object as the value. This evaluates to True. > > It's possible that asarray() should raise an exception for generators, but it > would be a special case. We wouldn't be able to test for arbitrary iterables. Would it be possible for asarray() to pull out the first element from the iterable, make an array out of it, then assume that all other values out of the iterable will have the same shape (raising an error, of course, when they aren't)? I guess this has high foot-shooting potential, but is it that much worse than numpy's shpe-guessing generally? It would be handy to be able to use an iterable to fill an array, so that you'd never need to store the values in anything else first: a = N.array((sin(N.pi*x/n) for x in xrange(n))) Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Stride of 2 for correlate()
On 05/02/2008, Chris Finley <[EMAIL PROTECTED]> wrote: > After searching the archives, I was unable to find a good method for > changing the stride of the correlate or convolve routines. I am doing a > Daubechies analysis of some sample data, say data = arange(0:80). The > coefficient array or four floats (say daub_g2[0:4]) is correlated over > the data. However, the product only needs to be calculated for every > other data index. I don't think that correlate or convolve can be made to behave this way. You can of course just throw away half the values, but I imagine you'd like it to be reasonably fast (do compare, though, speed tradeoffs can be surprising, particularly in python). This sort of thing comes up from time to time, so I took some old code I posted to scipy-user and put it in the cookbook at http://www.scipy.org/Cookbook/SegmentAxis . It works by converting your input array into a (in this case) 4 by n matrix with overlapping rows (without copying). You can then do what you like with the resulting array; convolutions just become matrix products. It's conceivable (though frankly unlikely) that the BLAS acceleration of matrix multiplication might make this run faster than correlate(). Good luck, Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Unexpected behavior with numpy array
On 03/02/2008, Damian Eads <[EMAIL PROTECTED]> wrote: > Good day, > > Reversing a 1-dimensional array in numpy is simple, > > A = A[:,:,-1] . > > However A is a new array referring to the old one and is no longer > contiguous. > > While trying to reverse an array in place and keep it contiguous, I > encountered some weird behavior. The reason for keeping it contiguous is > the array must be passed to an old C function I have, which expects the > buffer to be in row major order and contiguous. I am using lots of > memory so I want to minimize copying and allocation of new arrays. The short answer is that reversing an array in-place requires a certain amount of care, in C - you have to explicitly walk through swapping element i and element n-i, using a temporary variable. Getting numpy to exchange the elements in-place is going to be a real pain. I suggest trying a copying method first, and only getting fancier if it's too slow. > >>> A=numpy.arange(0,10) > >>> A > array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > >>> A[::-1] > array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0]) > >>> A[:] = A[::-1] > >>> A > array([9, 8, 7, 6, 5, 5, 6, 7, 8, 9]) > >>> > > Is there any way to perform assignments of the following form > > X[sliceI] = expression involving X > with dimensions that are > compatible with the left > hand side > > without causing the odd behavior I mentioned. If not, it might be > helpful to throw an exception when both the LHS and the RHS of an > assignment reference an array slice of the same variable? This is, odd as it seems, intended behaviour. Sometimes it's really useful to use the same array on the LHS and RHS: for example A[:-1]=A[1:] You just need to know how the operation is done under the hood (the arrays are iterated over in index order). (Actually, I'm not totally sure this is specified - under some circumstances numpy may iterate over dimensions in an order based on stride and/or size; whether this can affect the result of an operation like the above I'll have to think about.) In any case, much code depends on this. Tricks for getting the array backwards without copying... hmm. Well, you might be able to fill in the array in an unconventional order: A = N.arange(n-1,-1,-1) A=A[::-1] N.sin(A,A) # or whatever Now the reversal of A is a perfectly normal C array (though numpy may not realize this; don't trust the flags, check the strides and sizes). Or you could just write a little C function inplace_reverse(A); if you're already linking to C this shouldn't add too much complexity to your project. Or you could do it explicitly in numpy: for i in xrange(n/2): t = A[i] A[i]=A[n-i] A[n-i]=t In the likely case that this is too slow, you can do the copying in blocks, small enough that memory consumption is moderate but large enough that the python overhead is not too much. Finally, I realize that digging around in legacy code can be miserable, but it is often not really very difficult to make a C function handle strided data - the whole principle of numpy is that compiled code really just needs to know the start address, data type, and the spacing and length of an array along each dimension. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Can not update a submatrix
On 30/01/2008, Francesc Altet <[EMAIL PROTECTED]> wrote: > A Wednesday 30 January 2008, Nadav Horesh escrigué: > > In the following piece of code: > > >>> import numpy as N > > >>> R = N.arange(9).reshape(3,3) > > >>> ax = [1,2] > > >>> R > > > > array([[0, 1, 2], > >[3, 4, 5], > >[6, 7, 8]]) > > > > >>> R[ax,:][:,ax] = 100 > > >>> R > > > > array([[0, 1, 2], > >[3, 4, 5], > >[6, 7, 8]]) > > > > Why R is not updated? > > Because R[ax] is not a view of R, but another copy of the original > object (fancy indexing does return references to different objects). > In order to get views, you must specify only a slice of the original > array. For example: This is not exactly correct. There are two kinds of fancy indexing in numpy, which behave similarly. There is normal fancy indexing, which produces a new array, not sharing data with the old array: a = N.random.normal(size=10) b = a[a>0] There is also fancy indexing of lvalues (to use a C term): a = N.random.normal(size=10) a[a<0] *= -1 Here no copy is made; instead, the data is modified in-place. This requires some clever hacks under the hood, since a[a<0] *can't* be a view of a. The problem the OP had is that they are going beyond what the clever hacks can deal with. That's why R[ax,:] = 100 works but R[ax,:][:,ax] = 100 doesn't. The problem is that you have two indexing operations in the second situation, and the inplace operators can't deal with that. Solutions include working on a flattened version of the array (for which a single indexing operation suffices), using the (not in-place) where() construct, or using the ix_ function: R[N.ix_(ax,ax)] = 100 N.ix_ is necessary because R[ax,ax] doesn't do what I, for one, would have expected: R[[1,2],[3,4]] yields [R[1,3],R[2,4]], not [[R[1,3],R[1,4]],[R[2,3],R[2,4]]]. But in any case, once you reduce it to a single fancy-indexing operation, you can successfully use it as an lvalue. Striding and views are not really the issue. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Vectorizing a function
On 30/01/2008, Scott Ransom <[EMAIL PROTECTED]> wrote: > That works fine with arrays, scalars, or array/scalar mixes in the > calling. I do understand that more complicated functions might > require vectorize(), however, I wonder if sometimes it is used > when it doesn't need to be? It certainly is! I would guess that it gets used for one of two reasons: * People don't realize that it's just a for loop and assume it does numpy magic to make things faster. and * People care more about convenience than speed. I am usually (when I use it) in the second camp. I have some function with a bunch of conditionals - patching up the phase of some inverse trig function, say, or dealing with a few special cases, or cleanly raising exceptions - and it matters more to get the thing written quickly than to have it run quickly. So rather than stop and figure out how to use where() or fancy indexing to handle it efficiently, I just slap a vectorize() on a simple python function. I can always accelerate it later. For the same reason, I often drop into list comprehensions for a few lines, converting back to an array at the end. IMHO, one of the primary values of numpy/scipy is that it lets you express vector algorithms conveniently. When this makes it faster, this is a bonus; when it makes it slower (as sometimes happens with very small or very large arrays) it's a pain. But vectorize() lets me use these vector expressions very readily with functions that are written in a simple python style. Nevertheless, I think (going by some of the questions we get here) a lot of people start using numpy without appreciating that the *point* of using it is that you can use vector expressions; they think, "well, I'm using numbers in python, so I should use numpy", and then they don't take advantage of numpy's speedups. So perhaps we should do some more educating, in tutorials and docstrings, about the limitations of vectorize(). Anyone want to start a wiki page "How to get rid of vectorize()"? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Does float16 exist?
On 08/01/2008, Charles R Harris <[EMAIL PROTECTED]> wrote: > Well, at a minimum people will want to read, write, print, and promote them. > That would at least let people work with the numbers, and since my > understanding is that the main virtue of the format is compactness for > storage and communication, a basic need will be filled right there. One > potential problem I see is handling +/-inf and nans, tests for these should > probably be built into the type. The el-cheapo solution to this simply provides two functions: take an int16 array (which actually contains float16) and produce a float32 array, and vice versa. Then people do all their work in float32 (or float64 is float32 doesn't have inf/nan, I don't remember) but can read and write float16. Of course it would be nicer to use flaot16 natively, more or less, but without all the math that's going to be a frustrating experience. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Does float16 exist?
On 08/01/2008, Charles R Harris <[EMAIL PROTECTED]> wrote: > I'm starting to get interested in implementing float16 support ;) My > tentative program goes something like this: > > 1) Add the operators to the scalar type. This will give sorting, basic > printing, addition, etc. > 2) Add conversions to other types. This will allow promotion of data to > currently supported types. > 3) Unoptimized BLAS or ATLAS additions for the type. This would give array > operations. > 4) Figure out what the precedence relations are. > > Can you add some details to this roadmap? It also strikes me that this > exercise can be repeated for quad precision floats, so might be a good > preparation for future IEEE types like the proposed decimal floating types. > Note that we are going to need some way to distinguish quads and extended > precision types on 64bit platforms, as both will both register as float128 > in the current notation. Decimal floats can be added with the notation > decimal64, etc. How were you planning to implement the basic mathematics (addition, multiplication, trig, what have you)? Were you planning to code all that from scratch or is there a 16-bit float math library? Or were you planning to make this work only on machines with 16-bit math hardware? Or ship it out to a graphics card? Without hardware support, people may be happier using 16-bit floats only for on-disk storage... Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] CASTABLE flag
On 07/01/2008, Timothy Hochberg <[EMAIL PROTECTED]> wrote: > I'm fairly dubious about assigning float to ints as is. First off it looks > like a bug magnet to me due to accidentally assigning a floating point value > to a target that one believes to be float but is in fact integer. Second, > C-style rounding is pretty evil; it's not always consistent across > platforms, so relying on it for anything other than truncating already > integral values is asking for trouble. I'm not sure I agree that this is a bug magnet: if your array is integer and you think it's float, you're going to get burned sooner rather than later, whether you assign to it or not. The only case I can see would be a problem would be if zeros() and ones() created integers - which they don't. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Nasty bug using pre-initialized arrays
On 07/01/2008, Charles R Harris <[EMAIL PROTECTED]> wrote: > > One place where Numpy differs from MatLab is the way memory is handled. > MatLab is always generating new arrays, so for efficiency it is worth > preallocating arrays and then filling in the parts. This is not the case in > Numpy where lists can be used for things that grow and subarrays are views. > Consequently, preallocating arrays in Numpy should be rare and used when > either the values have to be generated explicitly, which is what you see > when using the indexes in your first example. As to assignment between > arrays, it is a mixed question. The problem again is memory usage. For large > arrays, it makes since to do automatic conversions, as is also the case in > functions taking output arrays, because the typecast can be pushed down into > C where it is time and space efficient, whereas explicitly converting the > array uses up temporary space. However, I can imagine an explicit typecast > function, something like > > a[...] = typecast(b) > > that would replace the current behavior. I think the typecast function could > be implemented by returning a view of b with a castable flag set to true, > that should supply enough information for the assignment operator to do its > job. This might be a good addition for Numpy 1.1. This is introducing a fairly complex mechanism to cover, as far as I can see, two cases: conversion of complex to real, and conversion of float to integer. Conversion of complex to real can already be done explicitly without a temporary: a[...] = b.real That leaves only conversion of float to integer. Does this single case cause enough confusion to warrant an exception and a way around it? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast iteration (I think I've got it)
On 01/01/2008, Neal Becker <[EMAIL PROTECTED]> wrote: > This is a c-api question. > > I'm trying to get iterators that are both fast and reasonably general. I > did confirm that iterating using just the general PyArrayIterObject > protocol is not as fast as using c-style pointers for contiguous arrays. I'd like to point out that "contiguous" can be misleading as it is used in numpy. An array is flagged contiguous if all the elements are contiguous *and* they are arranged as a C-ordered multidimensional array - i.e., the last index changes the fastest as you walk through the array elements, and the next-to-last chenges the next fastest, and so on. > Please confirm if my understanding is correct. There are 2 cases to > consider. There are plain old dense arrays, which will be contiguous, and > there are array 'views' or 'slices'. The views will not be contiguous. > However, in all cases, the strides between elements in one dimension are > constant. This last point is key. As long as the stride in the last > dimension is a constant, a fast iterator is possible. > > I put together a small test, which seems to work for the cases I tried. > The idea is to use the general iterator (via PyArray_IterAllButAxis), but > iteration over the last dimension is done with a c-style pointer. This is not an unreasonable approach, but it can suffer badly if the last dimension is "small", for example if you have an array of shape (1000,1000,2). Such arrays can arise for example from working with complex numbers as pairs of real numbers. Where is the acceleration coming from in this code? You can walk through a multidimensional array in pure C, just incrementing pointers as needed, without too much difficulty. If you want to save bookkeeping overhead, you can partially flatten the array: if your last dimension has stride 7 and length 11, and your second-to-last dimension has stride 77, you can flatten (using reshape, in python, to get a view) the last two dimensions of the array and use a nice quick iterator on the combined dimension. For a "C contiguous" array, this means you just walk through the whole array with a single layer of looping. I suspect that the place you will see big slowdowns is where data is accessed in an order that doesn't make sense from a cache point of view. If four-byte chunks of data are spaced every eight bytes, then the processor spends twice as much time waiting for cache loads as if they are spaced every four bytes. And most numpy tasks are almost certainly memory-bound - all the ufuncs I can think of almost certainly are (arctan of a double almost certainly takes less time than loading and storing a double). If you walk through an array in the wrong order - changing the big strides fastest and the small strides slowest - you'll almost certainly have to shuffle the whole array through cache eight times or more. This is even ignoring cache misses. (To cut down on cache misses you can use the GCC prefetch instructions, or god-knows-what equivalent in other compilers.) I seem to recall that numpy already reorders its walks through arrays within ufuncs - does it do so with cache in mind? For that matter, is information on the CPU's caches available to numpy? Anne P.S. Here is code to flatten an array as much as possible without changing the order of flatiter: def flatteniter(A): i=0 while ihttp://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] any better way to normalize a matrix
On 28/12/2007, Christopher Barker <[EMAIL PROTECTED]> wrote: > Anne Archibald wrote: > > Numpy provides ufuncs as general powerful tools for operating on > > matrices. More can be added relatively easily, they provide not just > > the basic "apply" operation but also "outer" and others. Adding > > another way to accomplish the same operation just adds bulk to numpy. > > Maybe so, but if it were up to me, I'd have just the methods, and not > the functions -- I like namespaces and OO design. I've always thought it > was a spurious argument that these kinds of functions can operate on > things that aren't numpy arrays -- it's true, but they all return > arrays, to it's not like they really are universal. It doesn't make a whole lot of sense to me to have n-ary methods (though I don't think we currently have any ternary ufuncs). How would you accomodate all the other operations ufuncs support? The big problem with methods, it seems to me, is that if I want to add a new ufunc, it's relatively easy. In fact, with vectorize() I do that all the time (with all sorts of arities). But adding compiled ufuncs in a new module is relatively straightforward; it's not at all clear how a separate package could ever add new methods to the ndarray object. This is not hypothetical - scipy.special introduces many mathematical functions that either are or should be ufuncs. As for whether this is "object-oriented", well, ufuncs are objects with a variety of methods; one could productively inherit from them (at least in principle, though I've never had occasion to). Anyway, I don't know that we need to rehash the old discussion. I just wanted to point out the value of ufuncs as objects. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] any better way to normalize a matrix
On 28/12/2007, Christopher Barker <[EMAIL PROTECTED]> wrote: > I like the array methods a lot -- is there any particular reason there > is no ndarray.abs(), or has it just not been added? Here I have to disagree with you. Numpy provides ufuncs as general powerful tools for operating on matrices. More can be added relatively easily, they provide not just the basic "apply" operation but also "outer" and others. Adding another way to accomplish the same operation just adds bulk to numpy. I don't even like myarray.max(), preferring numpy.amax() or numpy.maximum.reduce. I realize we can't remove any array methods, but I don't think we should add an array method for every unary function - surely you don't want to see myarray.arctan()? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] any better way to normalise a matrix
On 27/12/2007, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > in my code i am trying to normalise a matrix as below > > mymatrix=matrix(..# items are of double type..can be negative > values) > numrows,numcols=mymatrix.shape > > for i in range(numrows): > temp=mymatrix[i].max() > for j in range(numcols): > mymatrix[i,j]=abs(mymatrix[i,j]/temp) > > # i am using abs() to make neg vals positive > > this way the code takes too much time to run for a case of > numrows=25, and numcols=8100 etc.. > being a beginner in numpy and python ,i would like to know if this can > be done more efficiently..can anyone advise? Yes. A major advantage - really the reason for existence - of numpy is that you can do operations to whole matrices at once in a single instruction, without loops. This is convenient from a coding point of view, and it also allows numpy to greatly accelerate calculations. Writing code the way you have above takes no advantage of numpy's machinery, and it would probably run faster using python lists than numpy arrays. I strongly recommend you read some numpy documentation; try starting with the tutorial: http://www.scipy.org/Tentative_NumPy_Tutorial For example, to extract an array containing the maxima of each row of mymatrix, you can use the amax() function: temp = numpy.amax(mymatrix, axis=1) Multiplying a whole matrix by a constant can be done simply as well: 7*mymatrix A final comment is that numpy is designed for computations with multidimensional data. Its basic data type is the array. It has a specialized data type called "matrix" which provides slightly more convenient matrix multiplication (in the sense of linear algebra). I recommend you do not use numpy's matrix class until you are more accustomed to the software. Read the documentation. Start with the tutorial. Good luck. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] RAdian <--> degres conversion
On 16/12/2007, Hans Meine <[EMAIL PROTECTED]> wrote: > (*: It's similar with math.hypot, which I have got to know and appreciate > nowadays.) I'd like to point out that math.hypot is a nontrivial function which is easy to get wrong: In [6]: x=1e200; y=1e200; In [7]: math.hypot(x,y) Out[7]: 1.414213562373095e+200 In [8]: math.sqrt(x**2+y**2) --- Traceback (most recent call last) /home/peridot/ in () : (34, 'Numerical result out of range') Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] OT: A Way to Approximate and Compress a 3D Surface
On 20/11/2007, Geoffrey Zhu <[EMAIL PROTECTED]> wrote: > I have N tabulated data points { (x_i, y_i, z_i) } that describes a 3D > surface. The surface is pretty "smooth." However, the number of data > points is too large to be stored and manipulated efficiently. To make > it easier to deal with, I am looking for an easy method to compress > and approximate the data. Maybe the approximation can be described by > far fewer number of coefficients. > > If you can give me some hints about possible numpy or non-numpy > solutions or let me know where is better to ask this kind of question, > I would really appreciate it. This is an important task in computer graphics, in particular, in the field of multiresolution modelling. If you look up "surface simplification" you'll find many references to articles. I don't know of a library offhand that does it, let alone one that is accessible from python, but you could try looking at a toolkit that does isosurface visualization - these are surfaces that can often be simplified enormously. In particular it looks like VTK might be able to do what you want. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy : your experiences?
On 16/11/2007, Rahul Garg <[EMAIL PROTECTED]> wrote: > It would be awesome if you guys could respond to some of the following > questions : > a) Can you guys tell me briefly about the kind of problems you are > tackling with numpy and scipy? > b) Have you ever felt that numpy/scipy was slow and had to switch to > C/C++/Fortran? > c) Do you use any form of parallel processing? Multicores? SMPs? > Clusters? If yes how did u utilize them? > > If you feel its not relevant to the list .. feel free to email me personally. > I would be very interested in talking about these issues. I think it would be interesting and on-topic to hear a few words from people to see what they do with numpy. a) I use python/numpy/scipy to work with astronomical observations of pulsars. This includes a range of tasks including: simple scripting to manage jobs on our computation cluster; minor calculations (like a better scientific calculator, though frink is sometimes better because it keeps track of units); gathering and plotting results; prototyping search algorithms and evaluating their statistical properties; providing tools for manipulating photon data; various other tasks. We also use the software package PRESTO for much of the heavy lifting; much of it is written in python. b) I have projects for which python is too slow, yes. Pulsar surveys are extremely compute-intensive (I estimated one that I'm involved with at two or three mega-core-hours), so any software that is going in the pipeline should have its programming-time/runtime tradeoff carefully examined. I wrote a search code in C, with bits in embedded assembler. All the driver code to shuffle files around and supply correct parameters is in python, though. PRESTO has a similar pattern, with more C code because it does more of the hard work. In most cases the communication between the heavy-lifting code and the python code is through the UNIX environment (the heavy-lifting code gets run as a separate executable) but PRESTO makes many functions available in python modules. On the other hand, I often write quick Monte Carlo simulations that run for a day or more, but since writing them takes about as long as running them, it's not worth writing them in a language that would run faster. c) Our problems tend to be embarrassingly parallel, so we tend not to use clever parallelism toolkits. For the survey I am working on, I wrote one (python) script to process a single beam on a single node (which takes about thirty hours), and another to keep the batch queue filled with an appropriate number of such jobs. I have thought about writing a more generic tool for managing this kind of job queue, but haven't invested the time yet. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays
On 08/11/2007, David Cournapeau <[EMAIL PROTECTED]> wrote: > For copy and array creation, I understand this, but for element-wise > operations (mean, min, and max), this is not enough to explain the > difference, no ? For example, I can understand a 50 % or 100 % time > increase for simple operations (by simple, I mean one element operation > taking only a few CPU cycles), because of copies, but a 5 fold time > increase seems too big, no (mayb a cache problem, though) ? Also, the > fact that mean is slower than min/max for both cases (F vs C) seems a > bit counterintuitive (maybe cache effects are involved somehow ?). I have no doubt at all that cache effects are involved: for an int array, each data element is four bytes, but typical CPUs need to load 64 bytes at a time into cache. If you read (or write) the rest of those bytes in the next iterations through the loop, the (large) cost of a memory read is amortized. If you jump to the next row of the array, some large number of bytes away, those 64 bytes basically need to be purged to make room for another 64 bytes, of which you'll use 4. If you're reading from a FORTRAN-order array and writing to a C-order one, there's no way around doing this on one end or another: you're effectively doing a transpose, which is pretty much always expensive. Is there any reason not to let ufuncs pick whichever order for newly-allocated arrays they want? The natural choice would be the same as the bigger input array, if it's a contiguous block of memory (whether or not the contiguous flags are set). Failing that, the same as the other input array (if any); failing that, C order's as good a default as any. How difficult would this be to implement? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy FFT memory accumulation
On 31/10/2007, Ray S <[EMAIL PROTECTED]> wrote: > I am using > fftRes = abs(fft.rfft(data_array[end-2**15:end])) > to do running analysis on streaming data. The N never changes. > It sucks memory up at ~1MB/sec with 70kHz data rate and 290 ffts/sec. > (Interestingly, Numeric FFT accumulates much slower..) > (Commenting out that one line stops memory growth.) > > What can one do to alleviate this? > Can I del() some numpy object or such? > It's a bit of an issue for a program that needs to run for weeks. > > It's purpose is simply to argmax() the largest bin, which always > falls within a small range - do I have another, better option than > fft? If the range is *really* small, you can try using a DFT - sometimes that is fast enough, and gives you just the bins you're curious about. If the range is bigger than that, but still a small fraction of the FFT size, you can do some tricks where you band-pass filter the data (with a FIR filter, say), downsample (which aliases frequencies downward), and then FFT. Concretely, if your data is sampled at 80ksamp/s and you're taking 160ksamp FFTs, you're getting the spectrum up to 35 kHz with a resolution of 0.5 Hz. If all you care about is the spectral region from 5 kHz to 6 kHz but you still need the 0.5 Hz spectral resolution, you can run a band-pass filter to throw out everything outside (say) 4 to 7 kHz, then downsample to 8 ksamp/s; the 4 to 7kHz region will appear mirrored in the 1 to 4 kHz region of the new FFTs, which will each be a tenth the size. There are also "zoom fft" and "chirp-z" techniques which are supposed to give you only part of the FFT, but the wisdom is that unless you want less than a few percent of the data points you're better just FFTing and throwing lots of data away. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fortran array storage question
On 26/10/2007, Travis E. Oliphant <[EMAIL PROTECTED]> wrote: > There is an optimization where-in the inner-loops are done over the > dimension with the smallest stride. > > What other cache-coherent optimizations do you recommend? That sounds like a very good first step. I'm far from an expert on this sort of thing, but here are a few ideas at random: * internally flattening arrays when this doesn't affect the result (e.g. ones((10,10))+ones((10,10))) * prefetching memory: in a C application I recently wrote, explicitly prefetching data for interpolation cut my runtime by 30%. This includes telling the processor when you're done with data so it can be purged from the cache. * aligning (some) arrays to 8- 16- 32- or 64-byte boundaries so that they divide nicely into cache lines * using MMX/SSE instructions when available * combining ufuncs so that computations can keep the CPU busy while it waits for data to come in from main RAM (I realize that this is properly the domain of numexpr) * using ATLAS- or FFTW-style autotuning to determine the fastest ways to structure computations (again more relevant for significant expressions rather than simple ufuncs) * reducing use of temporaries in the interest of reducing traffic to main memory * openmp parallel operations when this actually speeds up calculation I realize most of these are a lot of work, and some of them are probably in numpy already. Moreover without using an expression parser it's probably not feasible to implement others. But an array language offers the possibility that a runtime can implement all sorts of optimizations without effort on the user's part. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fortran array storage question
On 26/10/2007, Georg Holzmann <[EMAIL PROTECTED]> wrote: > if in that example I also change the strides: > >int s = tmp->strides[1]; >tmp->strides[0] = s; >tmp->strides[1] = s * dim0[0]; > > Then I get in python the fortran-style array in right order. This is the usual way. More or less, at least. numpy is designed from the start to handle arrays with arbitrary striding; this is how slices are implemented, for example. There will be no major performance hit from numpy code itself. The actual organization of data in memory will of course affect the speed at which your code runs. The flags, as you discovered, are just a performance optimization, so that code that needs arrays organized as C- or FORTRAN-standard doesn't need to check the strides every time. I don't think numpy's loops - for example in ones((100,100))+eye(100) - are smart about doing operations in an order that makes cache-coherent use of memory. The important exception is the loops that use ATLAS, which I think is mostly the dot() function. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] appending extra items to arrays
On 11/10/2007, Robert Kern <[EMAIL PROTECTED]> wrote: > Appending to a list then converting the list to an array is the most > straightforward way to do it. If the performance of this isn't a problem, I > recommend leaving it alone. Just a speculation: Python strings have a similar problem - they're immutable, and so are even more resistant to growth than numpy arrays. For those situations where you really really want to grow a srting, python provides StringIO, where you keep efficiently adding to the string, then finalize it to get the real string out. Would something analogous be interesting for arrays? Anne P.S. A (somewhat) slow python implementation would be fairly easy to write. -A ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] bug in vectorize? (was: Re: Casting a float array into a string array)
On 05/10/2007, Christopher Barker <[EMAIL PROTECTED]> wrote: > I don't know how to generalize this to n-d though -- maybe numpy.vectorize? Oops! Looks like there's a big somewhere: In [1]: from numpy import * In [2]: vectorize(lambda x: "%5.3g" % x)(ones((2,2,2))) Out[2]: array([[[' ', '\xc1'], [' ', '\xc1']], [[' ', '\xc1'], [' ', '\xc1']]], dtype='|S1') In [3]: vectorize? Segmentation fault (core dumped) In particular, vectorize creates a string array (rather than an object array), and some unsafe operation is done on that array. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Casting a float array into a string array
On 05/10/2007, Matthieu Brucher <[EMAIL PROTECTED]> wrote: > I'd like to have the '2.', because if the number is negative, only '-' is > returned, not the real value. For string arrays you need to specify the length of the string as part of the data type (and it defaults to length 1): In [11]: random.normal(size=(2,3)).astype("|S20") Out[11]: array([['-0.223407464804', '1.1952765837', '-2.24073805089'], ['1.36604738338', '-0.197059880309', '2.15648625075']], dtype='|S20') In [12]: random.normal(size=(2,3)).astype("|S2") Out[12]: array([['-0', '1.', '-0'], ['0.', '0.', '1.']], dtype='|S2') In [13]: random.normal(size=(2,3)).astype("str") Out[13]: array([['-', '1', '1'], ['-', '-', '0']], dtype='|S1') Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion