Re: [Numpy-discussion] Status Report for NumPy 1.1.0

2008-05-07 Thread Anne Archibald
2008/5/7 Jarrod Millman <[EMAIL PROTECTED]>:
> I have just created the 1.1.x branch:
>  http://projects.scipy.org/scipy/numpy/changeset/5134
>  In about 24 hours I will tag the 1.1.0 release from the branch.  At
>  this point only critical bug fixes should be applied to the branch.
>  The trunk is now open for 1.2 development.

I committed Travis' matrix indexing patch (plus a basic test) to
1.1.x. Hope that's okay. (All tests pass.)

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] lexsort

2008-05-06 Thread Anne Archibald
2008/5/6 Eleanor <[EMAIL PROTECTED]>:
> >>> a = numpy.array([[1,2,6], [2,2,8], [2,1,7],[1,1,5]])
>  >>> a
>  array([[1, 2, 6],
>[2, 2, 8],
>[2, 1, 7],
>[1, 1, 5]])
>  >>> indices = numpy.lexsort(a.T)
>  >>> a.T.take(indices,axis=-1).T
>  array([[1, 1, 5],
>[1, 2, 6],
>[2, 1, 7],
>[2, 2, 8]])
>
>
>  The above does what I want, equivalent to sorting on column A then
>  column B in Excel, but al the transposes are ungainly. I've stared at it a 
> while
>  but can't come up with a more elegant solution. Any ideas?

It appears that lexsort is broken in several ways, and its docstring
is misleading.

First of all, this code is not doing quite what you describe. The
primary key here is the [5,6,7,8] column, followed by the middle and
then by the first. This is almost exactly the opposite of what you
describe (and of what I expected). To get this to sort the way you
describe, the clearest way is to write a sequence:

In [34]: indices = np.lexsort( (a[:,1],a[:,0]) )

In [35]: a[indices,:]
Out[35]:
array([[1, 1, 5],
   [1, 2, 6],
   [2, 1, 7],
   [2, 2, 8]])

In other words,sort over a[:,1], then sort again over a[:,0], making
a[:,0] the primary key. I have used "fancy indexing" to pull the array
into the right order, but it can also be done with take():

In [40]: np.take(a,indices,axis=0)
Out[40]:
array([[1, 1, 5],
   [1, 2, 6],
   [2, 1, 7],
   [2, 2, 8]])


As for why I say lexsort() is broken, well, it simply returns 0 for
higher-rank arrays (rather than either sorting or raising an
exception), it raises an exception when handed axis=None rather than
flattening as the docstring claims, and whatever the axis argument is
supposed to do, it doesn't seem to do it:

In [44]: np.lexsort(a,axis=0)
Out[44]: array([1, 0, 2])

In [45]: np.lexsort(a,axis=-1)
Out[45]: array([1, 0, 2])

In [46]: np.lexsort(a,axis=1)
---
Traceback (most recent call last)

/home/peridot/ in ()

: axis(=1) out of bounds


Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Status Report for NumPy 1.1.0

2008-05-06 Thread Anne Archibald
2008/5/6 Charles R Harris <[EMAIL PROTECTED]>:
>
> On Tue, May 6, 2008 at 6:40 AM, Alan G Isaac <[EMAIL PROTECTED]> wrote:
> >
> > On Tue, 6 May 2008, Jarrod Millman apparently wrote:
> > > open tickets that I would like everyone to take a brief
> > > look at:
> > > http://projects.scipy.org/scipy/numpy/ticket/760
> >
> > My understanding is that my patch, which would give
> > a deprecation warning, was rejected in favor of the patch
> > specified at the end of Travis's message here:
> >
> http://projects.scipy.org/pipermail/numpy-discussion/2008-April/033315.html>
> >
> > At least that's how I thought things were resolved.
> > I expected Travis's patch would be part of the 1.1 release.
> >
>
> I think someone needs to step up and make the change, but first it needs to
> be blessed by Travis and some of the folks on the Matrix indexing thread.
> The change might also entail removing some workarounds in the spots
> indicated by Travis.

It has my vote, as a peripheral participant in the matrix indexing
thread; in fact it's what some of us thought we were agreeing to
earlier.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Tests for empty arrays

2008-05-06 Thread Anne Archibald
2008/5/6 Andy Cheesman <[EMAIL PROTECTED]>:

>  I was wondering if anyone could shed some light on how to distinguish an
>  empty array of a given shape and an zeros array of the same dimensions.

An "empty" array, that is, an array returned by the function empty(),
just means an uninitialized array. The only difference between it and
an array returned by zeros() is the values in the array: from zeros()
they are, obviously, all zero, while from empty() they can be
anything, including zero(). In practice they tend to be crazy numbers
of order 1e300 or 1e-300, but one can't count on this.

As a general rule, I recommend against using empty() on your first
pass through the code. Only once your code is working and you have
tests running should you consider switching to empty() for speed
reasons. In fact, if you want to use empty() down the road, it may
make sense to initialize your array to zeros()/0., so that if you ever
use the values, the NaNs will propagate and become obvious.
Unfortunately, some CPUs implement NaNs in software, so there can be a
huge speed hit.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deprecating PyDataMem_RENEW ?

2008-05-06 Thread Anne Archibald
2008/5/5 David Cournapeau <[EMAIL PROTECTED]>:

>  Basically, what I have in mind is, in a first step (for numpy 1.2):
> - define functions to allocate on a given alignement
> - make PyMemData_NEW 16 byte aligned by default (to be compatible
>  with SSE and co).
>
>  The problem was, and still is, realloc. It is not possible to implement
>  realloc with malloc/free (in a portable way), and as such, it is not
>  possible to have an aligned realloc.

How much does this matter? I mean, what if you simply left all the
reallocs as-is? The arrays that resulted from reallocs would not
typically be aligned, but we cannot in any case expect all arrays to
be aligned.

Or, actually, depending on how you manage the alignment, it may be
possible to write a portable aligned realloc. As I understand it, a
portable aligned malloc allocates extra memory, then adds something to
the pointer to make the memory arena aligned. free() must then recover
the original pointer and call free() on it - though this can
presumably be avoided if garbage collection is smart enough. realloc()
also needs to recover the pointer to call the underlying realloc().
One answer is to ensure that at least one byte of padding is always
done at the beginning, so that the number of pad bytes used on an
aligned pointer p is accessible as ((uint8*)p)[-1]. Other schemes are
available; I have a peculiar affection for the pure-python solution of
constructing a view. In any of them, it seems to me, if free() can
recover the original pointer, so can realloc()...

I don't think any numpy function can assume that its input arrays are
aligned, since users may like to pass, say, mmap()ed hunks of a file,
over which numpy has no control of the alignment. In this case, then,
how much of a problem is it if a few stray realloc()s produce
non-aligned arrays?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deprecating PyDataMem_RENEW ?

2008-05-05 Thread Anne Archibald
2008/5/5 Robert Kern <[EMAIL PROTECTED]>:
> On Mon, May 5, 2008 at 7:44 AM, David Cournapeau
>  <[EMAIL PROTECTED]> wrote:
>
> >  In numpy, we can always replace realloc by malloc/free, because we know
>  >  the size of the old block: would deprecating PyMemData_RENEW and
>  >  replacing them by PyMemeData_NEW/PyMemData_FREE be possible, such as to
>  >  make all numpy arrays follow a default alignement ? There are only a few
>  >  of them in numpy (6 of them), 0 in scipy, and I guess extensions never
>  >  really used them ?
>
>  I am in favor of at least trying this out. We will have to have a set
>  of benchmarks to make sure we haven't hurt the current uses of
>  PyMemData_RENEW which Tim points out.

realloc() may not be a performance win anyway. Allocating new memory
is quite fast, copying data is quite fast, and in-place realloc()
tends to cause memory fragmentation - take an arena full of 1024-byte
blocks and suddenly make one of them 256 bytes long and you haven't
gained anything at all in most malloc() implementations. So yes,
benchmarks.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combining Sigmoid Curves

2008-05-02 Thread Anne Archibald
2008/5/2 Rich Shepard <[EMAIL PROTECTED]>:
> On Fri, 2 May 2008, Anne Archibald wrote:
>
>  > It's better not to work point-by-point, appending things, when working
>  > with numpy. Ideally you could find a formula which just produced the right
>  > curve, and then you'd apply it to the input vector and get the output
>  > vector all at once.
>
>  Anne,
>
>That's been my goal. :-)
>
>
>  > How about using the cosine?
>  >
>  > def f(left, right, x):
>  >scaled_x = (x-(right+left)/2)/((right-left)/2)
>  >return (1+np.cos((np.pi/2) * scaled_x))/2
>  >
>  > exactly zero at both endpoints, exactly one at the midpoint,
>  > inflection points midway between, where the value is 1/2. If you want
>  > to normalize it so that the area underneath is one, that's easy to do.
>
>  > More generally, the trick of producing a scaled_x as above lets you
>  > move any function anywhere you like.
>
>This looks like a pragmatic solution. When I print scaled_x (using left =
>  0.0 and right = 100.0), the values range from -1.0 to +0.998. So, I need to
>  figure out the scale_x that sets the end points at 0 and 100.

No, no. You *want* scaled_x to range from -1 to 1. (The 0.998 is
because you didn't include the endpoint, 100.) The function I gave,
(1+np.cos((np.pi/2) * scaled_x))/2, takes [-1, 1] to a nice
bump-shaped function.  If you feed in numbers from 0 to 100 as x, they
get transformed to scaled_x, and you feed them to the function,
getting a result that goes from 0 at x=0 to 1 at x=50 to 0 at x=100.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combining Sigmoid Curves

2008-05-02 Thread Anne Archibald
2008/5/2 Rich Shepard <[EMAIL PROTECTED]>:

>What will work (I call it a pi curve) is a matched pair of sigmoid curves,
>  the ascending curve on the left and the descending curve on the right. Using
>  the Boltzmann function for these I can calculate and plot each individually,
>  but I'm having difficulty in appending the x and y values across the entire
>  range. This is where I would appreciate your assistance.

It's better not to work point-by-point, appending things, when working
with numpy. Ideally you could find a formula which just produced the
right curve, and then you'd apply it to the input vector and get the
output vector all at once. For example for a Gaussian (which you're
not using, I know), you'd write a function something like

def f(x):
return np.exp(-x**2)

and then call it like:

f(np.linspace(-1,1,1000)).

This is efficient and clear. It does not seem to me that the logistic
function, 1/(1+np.exp(x)) does quite what you want. How about using
the cosine?

def f(left, right, x):
scaled_x = (x-(right+left)/2)/((right-left)/2)
return (1+np.cos((np.pi/2) * scaled_x))/2

exactly zero at both endpoints, exactly one at the midpoint,
inflection points midway between, where the value is 1/2. If you want
to normalize it so that the area underneath is one, that's easy to do.

More generally, the trick of producing a scaled_x as above lets you
move any function anywhere you like.

If you want the peak not to be at the midpoint of the interval, you
will need to do something a litle more clever, perhaps choosing a
different function, scaling the x values with a quadratic polynomial
so that (left, middle, right) becomes (-1,0,1), or using a piecewise
function.

Good luck,
Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Broadcasting question

2008-05-01 Thread Anne Archibald
2008/5/2 Chris.Barker <[EMAIL PROTECTED]>:
> Hi all,
>
>  I have a n X m X 3 array, and I a n X M array. I want to assign the
>  values in the n X m to all three of the slices in the bigger array:
>
>  A1 = np.zeros((5,4,3))
>  A2 = np.ones((5,4))
>  A1[:,:,0] = A2
>  A1[:,:,1] = A2
>  A1[:,:,2] = A2
>
>  However,it seems I should be able to broadcast that, so I don't have to
>  repeat myself, but I can't figure out how.

All you need is a newaxis on A2:
In [2]: A = np.zeros((3,4))

In [3]: B = np.ones(3)

In [4]: A[:,:] = B[:,np.newaxis]

In [5]: A
Out[5]:
array([[ 1.,  1.,  1.,  1.],
   [ 1.,  1.,  1.,  1.],
   [ 1.,  1.,  1.,  1.]])


Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] recarray fun

2008-04-30 Thread Anne Archibald
2008/5/1 Travis E. Oliphant <[EMAIL PROTECTED]>:
> Stéfan van der Walt wrote:
>  > 2008/4/30 Christopher Barker <[EMAIL PROTECTED]>:
>  >
>  >> Stéfan van der Walt wrote:
>  >>  > That's the way, or just rgba_image.view(numpy.int32).
>  >>
>  >>  ah -- interestingly, I tried:
>  >>
>  >>  rgba_image.view(dtype=numpy.int32)
>  >>
>  >>  and got:
>  >>
>  >>  Traceback (most recent call last):
>  >>File "", line 1, in 
>  >>  TypeError: view() takes no keyword arguments
>  >>
>  >>  Since it is optional, shouldn't it be keyword argument?
>  >>
>  >
>  > Thanks, fixed in r5115.
>  >
>  >
>  This was too hasty.   I had considered this before.
>
>  The problem with this is that the object can be either a type object or
>  a data-type object.  You can use view to both re-cast a numpy array as
>  another subtype or as another data-type.
>
>  So, please revert the change until a better solution is posted.

If we're going to support keyword arguments, shouldn't those two
options be *different* keywords (dtype and ndarray_subclass, say)?
Then a single non-keyword argument tells numpy to guess which one you
wanted...

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] untenable matrix behavior in SVN

2008-04-30 Thread Anne Archibald
2008/4/30 Charles R Harris <[EMAIL PROTECTED]>:

> Some operations on stacks of small matrices are easy to get, for instance,
> +,-,*,/, and matrix multiply. The last is the interesting one. If A and B
> are stacks of matrices with the same number of dimensions with the matrices
> stored in the last two indices, then
>
> sum(A[...,:,:,newaxis]*B[...,newaxis,:,:], axis=-2)
>
> is the matrix-wise multiplication of the two stacks. If B is replaced by a
> stack of 1D vectors, x, it is even simpler:
>
> sum(A[...,:,:]*x[...,newaxis,:], axis=-1)
>
> This doesn't go through BLAS, but for large stacks of small matrices it
> might be even faster than BLAS because BLAS is kinda slow for small
> matrices.

Yes and no. For the first operation, you have to create a temporary
that is larger than either of the two input arrays. These invisible
(potentially) gigantic temporaries are the sort of thing that puzzle
users when as their problem size grows they suddenly find they hit a
massive slowdown because it starts swapping to disk, and then a
failure because the temporary can't be allocated. This is one reason
we have dot() and tensordot() even though they can be expressed like
this. (The other is of course that it lets us use optimized BLAS.)


This rather misses the point of Timothy Hochberg's suggestion (as I
understood it), though: yes, you can write the basic operations in
numpy, in a more or less efficient fashion. But it would be very
valuable for arrays to have some kind of metadata that let them keep
track of which dimensions represented simple array storage and which
represented components of a linear algebra object. Such metadata could
make it possible to use, say, dot() as if it were a binary ufunc
taking two matrices. That is, you could feed it two arrays of
matrices, which it would broadcast to the same shape if necessary, and
then it would compute the elementwise matrix product.

The question I have is, what is the right mathematical model for
describing these
arrays-some-of-whose-dimensions-represent-linear-algebra-objects?


One idea is for each dimension to be flagged as one of "replication",
"vector", or "covector". A column vector might then be a rank-1 vector
array, a row vector might be a rank-1 covector array, a linear
operator might be a rank-2 object with one covector and one vector
dimension, a bilinear form might be a rank-2 object with two covector
dimensions. Dimensions designed for holding repetitions would be
flagged as such, so that (for example) an image might be an array of
shape (N,M,3) of types ("replication","replication","vector"); then to
apply a color-conversion matrix one would simply use dot() (or "*" I
suppose). without too much concern for which index was which. The
problem is, while this formalism sounds good to me, with a background
in differential geometry, if you only ever work in spaces with a
canonical metric, the distinction between vector and covector may seem
peculiar and be unhelpful.

Implementing such a thing need not be too difficult: start with a new
subclass of ndarray which keeps a tuple of dimension types. Come up
with an adequate set of operations on them, and implement them in
terms of numpy's functions, taking advantage of the extra information
about each axis. A few operations that spring to mind:

* Addition: it doesn't make sense to add vectors and covectors; raise
an exception. Otherwise addition is always elementwise anyway. (How
hard should addition work to match up corresponding dimensions?)
* Multiplication: elementwise across "repetition" axes, it combines
vector axes with corresponding covector axes to get some kind of
generalized matrix product. (How is "corresponding" defined?)
* Division: mostly doesn't make sense unless you have an array of
scalars (I suppose it could compute matrix inverses?)
* Exponentiation: very limited (though I suppose matrix powers could
be implemented if the shapes are right)
* Change of basis: this one is tricky because not all dimensions need
come from the same vector space
* Broadcasting: the rules may become a bit byzantine...
* Dimension metadata fiddling

Is this a useful abstraction? It seems like one might run into trouble
when dealing with arrays whose dimensions represent vectors from
unrelated spaces.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] very simple iteration question.

2008-04-30 Thread Anne Archibald
2008/4/30 Christopher Barker <[EMAIL PROTECTED]>:
> a g wrote:
>  > OK: how do i iterate over an axis other than 0?
>
>  This ties in nicely with some of the discussion about interating over
>  matrices. It ahs been suggested that it would be nice to have iterators
>  for matrices, so you could do:
>
>  for row in M.rows:
>...
>
>  and
>
>  for column in M.cols:
>...
>
>
>  If so, then wouldn't it make sense to have built in iterators for
>  nd-arrays as well? something like:
>
>  for subarray in A.iter(axis=i):
> ...
>
>  where axis would default to 0.
>
>  This is certainly cleaner than:
>
>  for j in range(A.shape[i]):
>  subarray = A[:,:,j,:]
>
>  Wait! I have no idea how to spell that generically -- i.e. the ith
>  index, where i is a variable. There must be a way to build a slice
>  object dynamically, but I don't know it.

Slices can be built without too much trouble using slice(), but it's
much easier to just write

for subarray in np.rollaxis(A,i):
...

rollaxis() just pulls the specified axis to the front. (It doesn't do
what I thought it would, which is a cyclic permutation of the axes).

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] very simple iteration question.

2008-04-30 Thread Anne Archibald
2008/4/30 a g <[EMAIL PROTECTED]>:
> Hi.  This is a very basic question, sorry if it's irritating.  If i
>  didn't find the answer written already somewhere on the site, please
>  point me to it.  That'd be great.
>
>  OK: how do i iterate over an axis other than 0?
>
>  I have a 3D array of data[year, week, location].  I want to iterate
>  over each year at each location and run a series of stats on the
>  columns (on the 52 weeks in a particular year at a particular location).
>   'for years in data:' will get the first one, but then how do i not
>  iterate over the 1 axis and iterate over the 2 axis instead?

Well, there's always

for i in xrange(A.shape[1]):
do_something(A[:,i,...])

But that's kind of ugly in this day and age. I would be tempted by

for a in np.rollaxis(A,1):
do_something(a)

It's worth mentioning that traversing an array along an axis not the
first usually results in subarrays that are not contiguous in memory.
While numpy makes working with these convenient, they may not be very
efficient for cache reasons (for example, modern processors load from
memory - an extremely slow operation - in blocks that may be as large
as 64 bytes; if you only use eight of these before moving on, your
code will use much more memory bandwidth than it needs to). If this is
a concern, you can usually transparently rearrange your array's memory
layout to avoid it.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] should outer take an output argument?

2008-04-30 Thread Anne Archibald
2008/4/29 Alan G Isaac <[EMAIL PROTECTED]>:
> As I was looking at Bill's conjugate gradient posting,
>  I found myself wondering if there would be a payoff
>  to an output argument for ``numpy.outer``.  (It is fairly
>  natural to repeatedly recreate the outer product of
>  the adjusted residuals, which is only needed during a
>  single iteration.)

I avoid np.outer, as it seems designed for foot shooting. (It forcibly
flattens both arguments into vectors no matter what shape they are and
then computes their outer product.) np.multiply.outer does (what I
consider to be) the right thing. Not that it takes an output argument
either, as far as I can tell.

But trying to inspect it suggests a few questions:

* Why are ufunc docstrings totally useless?
* Why are output arguments not keyword arguments?

As for your original question, yes, I think it would be good if
.outer() and .reduce() methods of ufuncs took output arguments. This
would let one use add.reduce() as a poor man's sum() even when dealing
with uint8s, for example. (Coming up with a problem for which
subtract.reduce(), divide.reduce(), or arctan2.reduce() are the
solutions is left as an exercise for the reader.) I can definitely see
a use for divide.outer() : Int -> Int -> Float, though.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] untenable matrix behavior in SVN

2008-04-29 Thread Anne Archibald
On 30/04/2008, Keith Goodman <[EMAIL PROTECTED]> wrote:
> On Tue, Apr 29, 2008 at 2:18 PM, Anne Archibald
>  <[EMAIL PROTECTED]> wrote:
>  >  It is actually pretty unreasonable to hope that
>  >
>  >  A[0]
>  >
>  >  and
>  >
>  >  A[[1,2,3]]
>  >  or
>  >  A[[True,False,True]]
>  >
>  >  should return objects of the same rank.
>
> Why it unreasonable to hope that
>
>  x[0,:]
>
>  and
>
>  x[0, [1,2,3]]
>
>  or
>
>  x[0, [True,False,True]]
>
>  where x is a matrix, continue to return matrices?

Well, I will point out that your example is somewhat different from
mine; nobody is arguing that your three examples should return objects
of different ranks. There is some disagreement over whether

x[0,:]

should be a rank-1 or a rank-2 object, but x[0,[1,2,3]] and x[0,
[True, False, True]] should all have the same rank as x[0,:]. Nobody's
questioning that.

What I was pointing out is that x[0,0] should not have the same rank
as x[0,:] or x[0,[0]]. In this case it's obvious; x[0,0] should be a
scalar. But the same logic applies to
x[0,:]
versus
x[[0,1],:]
or even
x[[0],:].

For arrays, if x has rank 2, x[0,:] has rank 1. if L is a list of
indices, x[L,:] always has rank 2, no matter what is actually in L
(even if it's one element). It is perfectly reasonable that they
should yield objects of different ranks.

With matrices, we have the surprising result that x[0,:] is still a
two-dimensional object;  to get at its third element I have to specify
 x[0,:][0,2]. It seems to me that this is a peculiar hack designed to
ensure that the result still has "*" defined in a matrix way.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] dot/tensordot limitations

2008-04-29 Thread Anne Archibald
Timothy Hochberg has proposed a generalization of the matrix mechanism
to support manipulating arrays of linear algebra objects. For example,
one might have an array of matrices one wants to apply to an array of
vectors, to yield an array of vectors:

In [88]: A = np.repeat(np.eye(3)[np.newaxis,...],2,axis=0)

In [89]: A
Out[89]:
array([[[ 1.,  0.,  0.],
[ 0.,  1.,  0.],
[ 0.,  0.,  1.]],

   [[ 1.,  0.,  0.],
[ 0.,  1.,  0.],
[ 0.,  0.,  1.]]])

In [90]: V = np.array([[1,0,0],[0,1,0]])

Currently, it is very clumsy to handle this kind of situation even
with arrays, keeping track of dimensions by hand. For example if one
wants to multiply A by V "elementwise", one cannot simply use dot:

In [92]: np.dot(A,V.T)
Out[92]:
array([[[ 1.,  0.],
[ 0.,  1.],
[ 0.,  0.]],

   [[ 1.,  0.],
[ 0.,  1.],
[ 0.,  0.]]])


tensordot() suffers from the same problem. It arises because when you
combine two multiindexed objects there are a number of ways you can do
it:

A: Treat all indices as different and form all pairwise products
(outer product):
In [93]: np.multiply.outer(A,V).shape
Out[93]: (2, 3, 3, 2, 3)

B: Contract the outer product on a pair of indices:
In [98]: np.tensordot(A,V,axes=(-1,-1)).shape
Out[98]: (2, 3, 2)

C: Take the diagonal along a pair of indices:
In [111]: np.diagonal(np.multiply.outer(A,V),axis1=0,axis2=3).shape
Out[111]: (3, 3, 3, 2)

What we want in this case is a combination of B and C:
In [106]: np.diagonal(np.tensordot(A,V,axes=(-1,-1)),axis1=0,axis2=2).T.shape
Out[106]: (2, 3)

 but it cannot be done without constructing a larger array and pulling
out its diagonal.

If this seems like an exotic thing to want to do, suppose instead we
have two arrays of vectors, and we want to evaluate the array of dot
products. None of no.dot, np.tensordot, or np.inner produce the
results you want. You have to multiply elementwise and sum. (This also
precludes automatic use of BLAS, if indeed tensordot uses BLAS.)

Does it make sense to implement a generalized tensordot that can do
this, or is it the sort of thing one should code by hand every time?

Is there any way we can make it easier to do simple common operations
like take the elementwise matrix-vector product of A and V?


The more general issue, of making linear algebra natural by keeping
track of which indices are for elementwise operations, and which
should be used for dots (and how), is a whole other kettle of fish. I
think for that someone should think hard about writing a full-featured
multilinear algebra package (might as well keep track of coordinate
transformations too while one was at it) if this is intended.


Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] accuracy issues with numpy arrays?

2008-04-29 Thread Anne Archibald
On 30/04/2008, eli bressert <[EMAIL PROTECTED]> wrote:
>  I'm writing a quick script to import a fits (astronomy) image that has
>  very low values for each pixel. Mostly on the order of 10^-9. I have
>  written a python script that attempts to take low values and put them
>  in integer format. I basically do this by taking the mean of the 1000
>  lowest pixel values, excluding zeros, and dividing the rest of the
>  image by that mean. Unfortunately, when I try this in practice, *all*
>  of the values in the image are being treated as zeros. But, if I use a
>  scipy.ndimage function, I get proper values. For example, I take the
>  pixel that I know has the highest value and do

I think the bug is something else:

>  import pyfits as p
>  import scipy as s
>  import scipy.ndimage as nd
>  import numpy as n
>
>  def flux2int(name):
> d  = p.getdata(name)
> x,y = n.shape(d)
> l = x*y
> arr1= n.array(d.reshape(x*y,1))
> temp = n.unique(arr1[0]) # This is where the bug starts. All values
>  are treated as zeros. Hence only one value remains, zero.

Actually, since arr1 has shape (x*y, 1), arr1[0] has shape (1,), and
so it has only one entry:

In [82]: A = np.eye(3)

In [83]: A.reshape(9,1)
Out[83]:
array([[ 1.],
   [ 0.],
   [ 0.],
   [ 0.],
   [ 1.],
   [ 0.],
   [ 0.],
   [ 0.],
   [ 1.]])

In [84]: A.reshape(9,1)[0]
Out[84]: array([ 1.])

The python debugger is a good way to check this sort of thing out; if
you're using ipython, typing %pdb will start the debugger when an
exception is raised, at which point you can poke around in all your
local variables and evaluate expressions.

> arr1= arr1.sort()
> arr1= n.array(arr1)
> arr1= n.array(arr1[s.where(arr1 >= temp)])
>
> val = n.mean(arr1[0:1000])
>
> d   = d*(1.0/val)
> d   = d.round()
> p.writeto(name[0,]+'fixed.fits',d,h)

Good luck,
Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] untenable matrix behavior in SVN

2008-04-29 Thread Anne Archibald
On 29/04/2008, Keith Goodman <[EMAIL PROTECTED]> wrote:
> On Tue, Apr 29, 2008 at 11:21 AM, Alan G Isaac <[EMAIL PROTECTED]> wrote:
>  > On Tue, 29 Apr 2008, Keith Goodman apparently wrote:
>  >  > I often use x[i,:] and x[:,i] where x is a matrix and i is
>  >  > a scalar. I hope this continues to return a matrix.
>  >
>  >  1. Could you give an example of the circumstances of this
>  >  use?
>
>
> In my use i is most commonly an array (i = M.where(y.A)[0] where y is
>  a nx1 matrix), sometimes a list, and in ipython when debugging or
>  first writing the code, a scalar. It would seem odd to me if x[i,:]
>  returned different types of objects based on the type of i:
>
>  array index
>  idx = M.where(y.A)[0] where y is a nx1 matrix
>  x[dx,:] -->  matrix
>
>  list index
>  idx = [0]
>  x[idx,:] --> matrix?
>
>  scalar index
>  idx = 0
>  x[idx,:] --> not matrix

It is actually pretty unreasonable to hope that

A[0]

and

A[[1,2,3]]
or
A[[True,False,True]]

should return objects of the same rank.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] untenable matrix behavior in SVN

2008-04-29 Thread Anne Archibald
On 29/04/2008, Gael Varoquaux <[EMAIL PROTECTED]> wrote:
> On Tue, Apr 29, 2008 at 11:03:58PM +0200, Anne Archibald wrote:
>  > I am puzzled by this. What is the rationale for x[i,:] not being a 1-d
>  > object?
>
> It breaks A*B[i, :] where A and B are matrices.

Really? How?

In [26]: A = np.matrix([[1,0],[0,1],[1,1]])

In [28]: A*np.ones(2)
Out[28]: matrix([[ 1.,  1.,  2.]])

In [29]: np.ones(3)*A
Out[29]: matrix([[ 2.,  2.]])


I guess you don't get dot products with *:

In [30]: np.ones(3).T*np.ones(3)
Out[30]: array([ 1.,  1.,  1.])

Since the whole point of matrices it to avoid having to type "dot" I
guess this should be convincing.


Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] untenable matrix behavior in SVN

2008-04-29 Thread Anne Archibald
On 29/04/2008, Travis E. Oliphant <[EMAIL PROTECTED]> wrote:

>  I'm quite persuaded now that a[i] should return a 1-d object for
>  arrays.In addition to the places Chuck identified,  there are at
>  least 2 other places where special code was written to work-around the
>  expectation that item selection returns an object with reduced
>  dimensionality (a special-cased .tolist for matrices and a special-cased
>  getitem in the fancy indexing code).
>
>  As the number of special-case work-arounds grows the more I'm convinced
>  the conceptualization is wrong.   So, I now believe we should change the
>  a[i] for matrices to return a 1-d array.
>
>  The only down-side I see is that a[i] != a[i,:] for matrices.
>  However,  matrix(a[i]) == a[i,:], and so I'm not sure there is really a
>  problem, there.  I also don't think that whatever problem may actually
>  be buried in the fact that type(a[i]) != type(a[i,:]) is worse than the
>  problem that several pieces of NumPy code actually expect hierarchical
>  container behavior of multi-dimensional sequences.

I am puzzled by this. What is the rationale for x[i,:] not being a 1-d
object? Does this not require many special-case bits of code as well?
What about a[i,...]? That is what I would use to make a hierarchical
bit of code, and I would be startled to find myself in an infinite
loop waiting for the dimension to become one.

>  I don't think making the small change to have a[i] return 1-d arrays
>  precludes us from that 1-d array being a bit more formalized in 1.2 as a
>  RowVector should we choose that direction.   It does, however, fix
>  several real bugs right now.

+1

What's more, I think we need to have a matrix-cleanliness test suite
that verifies that all basic numpy tools behave identically on
matrices and preserve matrixiness. But that's a big job, and for 1.2.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Movement of ma breaks matplotlib

2008-04-25 Thread Anne Archibald
Hi,

In the upcoming release of numpy. numpy.core.ma ceases to exist. One
must use numpy.ma (for the new interface) or numpy.oldnumeric.ma (for
the old interface). This has the unfortunate effect of breaking
matplotlib(which does "from numpy.core.ma import *") - I cannot even
"import pylab" with a stock matplotlib (0.90.1) and an SVN numpy. Is
this an intended effect?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] untenable matrix behavior in SVN

2008-04-25 Thread Anne Archibald
On 25/04/2008, Charles R Harris <[EMAIL PROTECTED]> wrote:
>
>
> On Fri, Apr 25, 2008 at 12:02 PM, Alan G Isaac <[EMAIL PROTECTED]> wrote:
> > I think we have discovered that there is a basic conflict
> > between two behaviors:
> >
> >x[0] == x[0,:]
> >vs.
> >
> >x[0][0] == x[0,0]
> >
> > To my recollection, everyone has agree that the second
> > behavior is desirable as a *basic expectation* about the
> > behavior of 2d objects.  This implies that *eventually* we
> > will have ``x[0][0] == x[0,0]``.  But then eventually
> > we MUST eventually have x[0] != x[0,:].
> >
> Choices:
>
> 1) x[0] equiv x[0:1,:] -- current behavior
> 2) x[0] equiv x[0,:] -- array behavior, gives x[0][0] == x[0,0]
>
> These are incompatible. I think 1) is more convenient in the matrix domain
> and should be kept, while 2) should be given up. What is the reason for
> wanting 2)? Maybe we could overload the () operator so that x(0) gives 2)?
> Or vice versa.

We are currently operating in a fog of abstraction, trying to decide
which operations are more "natural" or more in correspondence with our
imagined idea of a user. We're not getting anywhere. Let's start
writing up (perhaps on the matrix indexing wiki page) plausible use
cases for scalar indexing, and how they would look under each
proposal.

For example:

* Iterating over the rows of a matrix:

# array-return proposal; x is a scalar
for row in M:
if np.all(row==0):
raise ValueError
# exception-raising proposal
for i in xrange(M.shape[0]):
if np.all(M[i,:]==0):
raise ValueError

Other applications?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Generating Bell Curves (was: Using normal() )

2008-04-25 Thread Anne Archibald
On 24/04/2008, Rich Shepard <[EMAIL PROTECTED]> wrote:
>Thanks to several of you I produced test code using the normal density
>  function, and it does not do what we need. Neither does the Gaussian
>  function using fwhm that I've tried. The latter comes closer, but the ends
>  do not reach y=0 when the inflection point is y=0.5.
>
>So, let me ask the collective expertise here how to generate the curves
>  that we need.
>
>We need to generate bell-shaped curves given a midpoint, width (where y=0)
>  and inflection point (by default, y=0.5) where y is [0.0, 1.0], and x is
>  usually [0, 100], but can vary. Using the NumPy arange() function to produce
>  the x values (e.g, arange(0, 100, 0.1)), I need a function that will produce
>  the associated y values for a bell-shaped curve. These curves represent the
>  membership functions for fuzzy term sets, and generally adjacent curves
>  overlap where y=0.5. It would be a bonus to be able to adjust the skew and
>  kurtosis of the curves, but the controlling data would be the
>  center/midpoint and width, with defaults for inflection point, and other
>  parameters.
>
>I've been searching for quite some time without finding a solution that
>  works as we need it to work.

First I should say, please don't call these "bell curves"! It is
confusing people, since that usually means specifically a Gaussian. In
fact it seems that you want something more usually called a "sigmoid",
or just a curve with a particular shape. I would look at
http://en.wikipedia.org/wiki/Sigmoid_function
In particular, they point out that the integral of any smooth,
positive, "bump-shaped" function will be a sigmoid. So dreaming up an
appropriate bump-shaped function is one way to go.

Alternatively, tou can look at polynomial fitting - if you want, say,
a function that is 1 with derivative zero at 0, 0.5 with derivative -1
at x, and 0 with derivative 0 at 1, you can construct a unique
degree-6 polynomial that does exactly that; there's a new tool,
KroghInterpolator, in scipy svn that can do that for you.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy release

2008-04-25 Thread Anne Archibald
On 25/04/2008, Stéfan van der Walt <[EMAIL PROTECTED]> wrote:
> 2008/4/25 Alan G Isaac <[EMAIL PROTECTED]>:
>
> >  I must have misunderstood:
>  >  I thought the agreement was to
>  >  provisionally return a 1d array for x[0],
>  >  while we hashed through the other proposals.
>
> The agreement was:
>
>  a) That x[0][0] should be equal to x[0,0] and
>  b) That x[0,:] should be equal to x[0] (as for ndarrays)
>
>  This breaks as little functionality as possible, at the cost of one
>  (instead of two) API changes.

Hold on. There has definitely been some confusion here. This is not
what I thought I was suggesting, or what Alan thought he was
suggesting. I do not think special-casing matrices for which one
dimension happens to be one is a good idea at all, even temporarily.
This is the kind of thing that drives users crazy.

My suggested stopgap fix was to make x[0] return a 1D *array*; I feel
that this will result in less special-casing. In fact I wasn't aware
that anyone had proposed the fix you implemented. Can we change the
stopgap solution?

>  We should now discuss the proposals on the table, choose a good one,
>  and implement all the API changes necessary for 1.2 or 2.  It's a pity
>  we have to change the API again, but the current situation is not
>  tenable.

Yes, well, it really looks unlikely we will be able to agree on what
the correct solution is before 1.1, so I would like to have something
non-broken for that release.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] "aligned" matrix / ctypes

2008-04-24 Thread Anne Archibald
On 24/04/2008, Zachary Pincus <[EMAIL PROTECTED]> wrote:

>  In the mean time, is there any way to (from within python) construct
>  an array from another array by specifying the dtype and strides? That
>  is, some other way to do the view/slice business directly?

There's the ndarray constructor, which lets you do some highly dodgy
things like create arrays whose rows overlap in memory. I would think
it would be possible to do this with that. You might also want to
think about allowing arrays to have their rows aligned (for example, a
100 by 100 array of float32s which wants 64-byte alignment for every
row).

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using normal()

2008-04-24 Thread Anne Archibald
On 24/04/2008, Keith Goodman <[EMAIL PROTECTED]> wrote:
> On Thu, Apr 24, 2008 at 10:01 AM, Rich Shepard <[EMAIL PROTECTED]> wrote:
>  >In the application's function I now have:
>  >
>  >from numpy import *
>  >
>  >x = nx.arange(0, 100, 0.1)
>  >y = nx.normal(center,width)  # passed to the function when called
>  >
>  >  and I then pass x,y to PyX for plotting. But python responds with:
>  >
>  >y = nx.normal(center,width)
>  >  AttributeError: 'module' object has no attribute 'normal'
>
> It's random.normal(loc, scale). But that will give you random x values
>  drawn from the normal distribution, not y values at your x. So you'll
>  have to code your own normal, which will probably look something like
>
>  norm = 1 / (scale * sqrt(2 * pi))
>  y = norm * exp(-power((x - loc), 2) / (2 * scale**2))

I really have to recommend you instead install scipy, which contains
this as well as many other probability distributions:

D = scipy.stats.norm(0, 1)

Then D.pdf(x) gives you the probability density function at x,
D.cdf(x) gives you the probability that the function value is less
than x, D.icdf(y) gives you the x value for which the probability is y
that the value will be less than x, and so on. This is also available
for more probability distributions than I had ever heard of.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] "aligned" matrix / ctypes

2008-04-23 Thread Anne Archibald
On 23/04/2008, Zachary Pincus <[EMAIL PROTECTED]> wrote:
> Hi,
>
>  Thanks a ton for the advice, Robert! Taking an array slice (instead of
>  trying to set up the strides, etc. myself) is a slick way of getting
>  this result indeed.

It's worth mentioning that there was some discussion of adding an
allocator for aligned arrays. It got sidetracked into a discussion of
SSE support for numpy, but there are all sorts of reasons to want
aligned arrays in numpy, as this post demonstrates. Seeing as it's
just a few lines worth of pure python, is anyone interested in writing
an aligned_empty() for numpy?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy release

2008-04-23 Thread Anne Archibald
On 23/04/2008, Alan Isaac <[EMAIL PROTECTED]> wrote:
> On Wed, 23 Apr 2008, Sebastian Haase wrote:
>  > What used to be referred to a the 1.1 version, that can
>  > break more stuff, to allow for a cleaner design, will now
>  > be 1.2
>
> So ... fixing x[0][0] for matrices should wait
>  until 1.2.  Is that correct?

It seems to me that everyone agrees that the current situation is
broken, but there is considerable disagreement on what the correct fix
is. My inclination is to apply the minimal fix you suggest, with tests
and documentation, as a stopgap, and let the different factions sort
it out by discussion on the way to 1.2.

Objections?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Truth value of an array

2008-04-18 Thread Anne Archibald
On 18/04/2008, Robert Kern <[EMAIL PROTECTED]> wrote:
> On Fri, Apr 18, 2008 at 3:33 PM, Joe Harrington <[EMAIL PROTECTED]> wrote:
>  > For that matter, is there a reason logical operations don't work on
>  >  arrays other than booleans?  What about:
>
> The keywords "and", "or", and "not" only work on bool objects; Python
>  tries to convert the operands using bool(). bool(some_array) raises
>  the exception just as we've been discussing.

This is a bit misleading, as the actual value is returned:

In [167]: "" or "A"
Out[167]: 'A'

In [168]: "A" and "B"
Out[168]: 'B'

In fact the problem is that "and" and "or" are short-circuiting, which
means that "return X or Y" is equivalent to something like:
if X:
return X
elif Y:
return Y
else:
return False

These are handled specially by syntax so that, for example, you can do
"False and 1/0" and not raise a ZeroDivisionError. So "and" and "or"
aren't even really operators, and it's not just impossible to change
them but unlikely to ever become possible. Just cast your arrays to
booleans if you want to do boolean operations on them.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Truth value of an array

2008-04-18 Thread Anne Archibald
On 18/04/2008, Olivier <[EMAIL PROTECTED]> wrote:
> Let's restrict the discussion the case to boolean arrays (dtype bool),
>  since all the comparisons (A==B, A!=B, A  arrays).
>
>  So I have an array filled with booleans. Is there a reason not to map
>  "bool(A)" to "A.all()" but instead raise an exception?
>
>  As far as I can see, "if A==B" is clear enough; one always means
>  "(A==B).all()". Isn't that so?
>
>  I would like to see examples where there is clearly an ambiguity so
>  that I understand the benefit of having an exception raised for
>  boolean matrices instead of simply using all().
>
>  If there is no such clear example, then why not map "bool(A)" to
>  "A.all()" in numpy?

In [1]: import numpy as np

In [2]: A = np.array([0,1])

In [3]: B = np.array([0,0])

In [4]: (A==B).all()
Out[4]: False

In [5]: (A!=B).all()
Out[5]: False

One would expect A!=B to be the logical opposite of A==B, but with
your proposed suggestion it is not.

In math, when comparing functions, one can compare the functions as a
whole, or one can compare them pointwise. numpy's implementation does
pointwise comparison, for a variety of good reasons. As for why
converting an array to a boolean doesn't automatically do all(), if
you don't know you are dealing with an array and using all(), you will
surely shoot yourself in the foot sooner rather than later (as the
above example shows). If you do, simply wrapping it in all() is easy
and clear.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] broken numpy.core.tests.test_multiarray.TestScalarIndexing.SetUp

2008-04-17 Thread Anne Archibald
On 17/04/2008, Stéfan van der Walt <[EMAIL PROTECTED]> wrote:
> On 17/04/2008, Robert Kern <[EMAIL PROTECTED]> wrote:
>
> > On Thu, Apr 17, 2008 at 1:21 PM, Eric Firing <[EMAIL PROTECTED]> wrote:
>  >  > Arg!  Cancel that!  I didn't look carefully enough.  How embarrassing!
>  >  >  Sorry for the noise.
>  >
>  >
>  > Don't apologize. That is very odd code. Stefan, is there a reason to
>  >  form a 1-item tuple then do 1-item tuple unpacking everywhere?
>
>
> Did I write that code?  I agree that formatting is odd.

Oh! Actually I think that one's my fault. Comes of being too minimal
in modifying one of your test cases.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] broken numpy.core.tests.test_multiarray.TestScalarIndexing.SetUp

2008-04-17 Thread Anne Archibald
On 17/04/2008, Robert Kern <[EMAIL PROTECTED]> wrote:
> On Thu, Apr 17, 2008 at 1:21 PM, Eric Firing <[EMAIL PROTECTED]> wrote:
>  > Arg!  Cancel that!  I didn't look carefully enough.  How embarrassing!
>  >  Sorry for the noise.
>
>
> Don't apologize. That is very odd code. Stefan, is there a reason to
>  form a 1-item tuple then do 1-item tuple unpacking everywhere? The
>  test works the same after removing the extraneous commas. Anyways,
>  I've checked that in.

This is not necessarily a justification, but many tests construct
tuples of test objects which are then unpacked at the beginning of
every function. This is not unreasonable when multiple objects are
present:

class TestSomething(NumpyTestCase):
def setUp(self):
A = array([1,2,3])
B = array([4,5,6])
self.d = A, B

def test_something(self):
A, B = self.d
assert_not_equal(A,B)

It's a little less cumbersome than using self.A and self.B inside each
test case.

Does it make sense to use a length-1 tuple when there's only one piece
of test data, just for consistency?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Matrix vs ndarray

2008-04-17 Thread Anne Archibald
On 17/04/2008, Santanu Chatterjee <[EMAIL PROTECTED]> wrote:
> Hi Numpy users,
> I used MATLAB to do numerical calculations for a long time. Recently I
> am digging into python and numpy. I am wondering about the following
> question :
>
> 1) What is the difference between ndarray and matrix in numpy? My idea is
> that having N-dimensional array is sufficient (of course a MATLAB users
> point of view). If anyone can provide some idea, I will appreciate it.

The difference comes about because python has a paucity of operators.
In particular, there is only one multiplication operator. Thus for
ndarrays "*" does elementwise multiplication, and to get matrix
multiplication you must use the function "dot". Similarly "**" does
elementwise exponentiation, and to get matrix exponentiation you must
use the function "matrix_power".

Since some people find this inconvenient, the "matrix" class exists.
It represents arrays that are always two-dimensional, and for them "*"
does matrix multiplication. The only difference is how the operators
(and certain indexing operations) are interpreted. If you want to
interpret a matrix M as an array, just use "M.A"; it you want to
interpret an array X as a matrix, do "matrix(X)". In neither case is
data copied.

Personally, I do not ever use matrices; I find "dot" is quite
convenient enough for me. Moreover the numpy standard library has an
undetermined number of bugs where standard functions acting on
matrices return the correct values but in the form of arrays instead;
thus matrix users occasionally find that they have inadvertently (and
silently) transformed their matrices back into arrays.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Release of NumPy

2008-04-16 Thread Anne Archibald
On 16/04/2008, Stéfan van der Walt <[EMAIL PROTECTED]> wrote:
> On 16/04/2008, Alan G Isaac <[EMAIL PROTECTED]> wrote:
>  >  > The whole issue occurs because a Matrix is not a proper
>  >  > container.
>  >
>  >
>  > Right.  And *that* is the case because of the attempt to
>  >  treat matrices as containers of matrices instead of as
>  >  containers of 1d arrays.
>  >
>  >  I can see no real advantage to having matrices be containers
>  >  of row vectors instead.  A row vector would just be a matrix
>  >  (i.e., essentially 2d) that allowed 1d style indexing.
>  >  Again I ask, have you ever needed such a thing?
>
>
> In the Matrix world, there is no such thing as an (N,) array -- which
>  is why the current Matrix object behaves so strangely, and always
>  returns a result with ndim > 1.
>
>  Your proposal suggests that a Matrix be a container of arrays, but it
>  does not address the slicing of column vectors, i.e.
>
>  x[0]
>  x[0,:]
>  x[:,0]
>
>  would then behave in completely different ways, whereas with ndarrays
>  they don't.
>
>  If you modified the proposal to say "any slicing operation that
>  returns a (1,N) or (N,1) matrix should now return an (N,) ndarray", it
>  would be more consistent, but then, wouldn't that break some other
>  expected behaviour?  I thought one of the powerful features of a
>  matrix is that it preserves the orientation of a vector slice?
>
>  The only way I've seen so far to have consistent slicing, is to have
>  an N-dim Matrix object be a container of N-1 dim Matrices, up to the
>  2-D case, in which it becomes a container of Vectors which, in turn,
>  contain scalars.

I don't think of arrays as containers of anything but scalars, so I
find this whole argument from intuition extremely strange.

My (draconian) suggestion would be to simply raise an exception when a
matrix is indexed with a scalar. They're inherently two-dimensional;
if you want a submatrix you should provide both indices (possibly
including a ":"). If you actually want a subarray, as with an array,
use ".A".

That said, I don't actually use matrices, so I don't get a vote.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Release of NumPy

2008-04-15 Thread Anne Archibald
On 15/04/2008, Jon Wright <[EMAIL PROTECTED]> wrote:
> > On 15/04/2008, Alan G Isaac <[EMAIL PROTECTED]> wrote:
>
> ...
>
> >  The proposal on the table is to remove an unneeded (and
>  >  unwanted) deviation of the matrix API from the ndarray API.
>
> ...
>
>  How about writing up the changes needed PEP style on the wiki?

+1

This discussion risks going around in circles. Write up your proposed
solutions, with example code, in PEP style, here:
http://www.scipy.org/ProposedEnhancements

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] float32 is not a float ?

2008-04-11 Thread Anne Archibald
On 10/04/2008, Charles R Harris <[EMAIL PROTECTED]> wrote:

> I think you want the isreal function, but it will also return true for
> complex with 0 imaginary part. Hmm... the various iswhatever functions seem
> to be lacking in coverage. Maybe we should fix that.

icomplexobj is designed to solve that problem.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Infinity definitions

2008-04-10 Thread Anne Archibald
On 10/04/2008, Travis E. Oliphant <[EMAIL PROTECTED]> wrote:
> Bruce Southey wrote:
>  > Hi,
>  > Since we are discussing namespace and standardization, I am curious in
>  > why there are multiple definitions for defining infinity in numpy when
>  > perhaps there should be two (one for positive infinity and one for
>  > negative infinity). I really do understand that other people have use of
>  > these definitions and that it is easier to leave them in than take them
>  > out. Also, it is minor reduction in namespace because I do know that
>  > much of the namespace is either defining variables (like different
>  > floats and complex numbers) or mathematical functions (like logs and
>  > trig functions).
>  >
>  > Currently we have:
>  > numpy.Inf
>  > numpy.Infinity
>  > numpy.inf
>  > numpy.infty
>  > numpy.NINF
>  > numpy.PINF
>  >
>  > Most of these are defined in numeric.py: 'Inf = inf = infty = Infinity =
>  > PINF'
>  > In the f2py/tests subdirectories, the files return_real.py and
>  > return_complex.py uses both 'inf','Infinity'.
>  > The only occurrence of NINF and PINF are in core/src/umathmodule.c but I
>  > don't see any other usage.
>  > There does not seem to be any use of 'infty'.
>  >
>
> I think this is a product of bringing together a few definitions into
>  one and not forcing a standard.
>
>  numpy.inf
>  numpy.nan
>
>  should be used except for backward compatibility.

The others have some use if you want to be able to use the results of
repr() as literals - as I understand it the output representation of a
NaN depends on the C library, and users seeing, say, "NaN" might well
expect to be able to type NaN (after from numpy import NaN) and get a
NaN.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] boolean indexing of rank-0 arrays and scalars - ticket #721

2008-04-10 Thread Anne Archibald
Hi,

Ticket #721 is titled "0-dimensional boolean arrays should work as
masks for array scalars". It is not quite clear to me what the right
behaviour is.

We are generally trying to make zero-dimensional arrays behave the
same as array scalars, but I'm not sure how that should work in this
case:

In [2]: scalar = np.array([1,2])[0]

In [3]: rank0 = np.array(1)

In [4]: scalar[np.array(False)]
---
Traceback (most recent call last)

/home/peridot/software/numpy/test/lib/python2.5/site-packages/ in ()

: invalid index to scalar variable.

In [5]: rank0[np.array(False)]
Out[5]: array([], dtype=int32)

In [6]: rank0[np.array(False)].shape
Out[6]: (0,)

In [7]: rank0[np.array(True)].shape
Out[7]: ()

Here indexing a zero-dimensional array produces either a
zero-dimensional array if the argument is True or a one-dimensional
array of length zero if the argument is False. This is necessary
because there's not really a way to represent "no value" with a
zero-dimensional array - they always have exactly one value.

Should a boolean index into a scalar always produce an array as a
result? Only if that boolean is False?

I should also point out that we have another difference between array
scalars and zero-dimensional arrays:

In [7]: rank0[np.array(True)].shape
Out[7]: ()

In [8]: rank0[np.array([True,False])[0]].shape
---
Traceback (most recent call last)

/home/peridot/software/numpy/test/lib/python2.5/site-packages/ in ()

: 0-d arrays can't be indexed

Zero-dimensional arrays can be used as indices, but
apparently-equivalent scalars cannot.

I am totally unclear on what the distinction between rank-zero arrays
and scalars should be and what indexing into them should look like. Is
there a document describing this (or are we just throwing an
undocumented mass of confusing corner cases at users)?

Thanks,
Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ma's stdu and varu

2008-04-06 Thread Anne Archibald
Hi,

I was just going through tidying up the documentation for all the many
functions in numpy that compute standard deviations or variances (the
functions, the methods, the methods on matrices, the methods on
maskedarrays, all needed their docstrings updated in approximately the
same way). I noticed that maskedarrays seem to have the functions
"stdu" and "varu", for computing the unbiased standard deviation and
variance (where one divides by N-1 rather than N). These are now
available, I think, via std and var with ddof=1. More seriously, they
still provide the peculiar older definition of var, where
varu([1,1.j,-1,-1.j])==0. I don't use maskedarrays, and in fact I
haven't been keeping track of what's going on with the two different
implementations. So: what should be done about stdu and varu? Do the
two implementations both need their definitions of std and var
examined?

Thanks,
Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matrix multiply

2008-04-06 Thread Anne Archibald
>  > and for negative powers some sort of floating-point
>  > inverse.
>
> That deserves discussion.
>  Not all "invertible" boolean matrices have an inverse in the algebra.
>  Just the orthogonal ones do.
>
>  I guess I would special case inverses for Boolean matrices.
>  Just test if the matrix B is orthogonal (under Boolean
>  multiplication) and if so return B's transpose.

This is actually a limitation of the whole linear algebra subsystem:
we only support floating-point linear algebra (apart from dot()).
There are algorithms for doing integer linear algebra, which might be
interesting to implement. But that's a big job. Boolean matrices as
you use them are another step again: because they are not a group
under "+", almost all of linear algebra has to be reinterpreted. For
example it's not obvious that matrix multiplication is associative; it
turns out to be, because you can replace the matrices by integer
matrices containing ones for True and zeros for False, then do the
math, then interpret any nonzero integer as True, and zero as False.

As an aside, if you use "+" to mean xor (which I am not suggesting!)
all of linear algebra continues more or less unchanged; you're just
working over the field of integers modulo 2. If you want eigenvalues
you can pass to an algebraic closure.

>  > The inverse actually poses a problem: should it return
>  > (A**(-1))**n or (A**n)**(-1)? (No, they're not the same
>  > for boolean matrices.)
>
> I think it must be the latter ... ?
>
>  By associativity, if B has an inverse A,
>  then B**n must have inverse A**n.
>  So you are observing that with boolean matrices
>  we might find B**n is invertible even though B is not.
>  Right?  So the latter will work in more cases.
>  So again: form B**n, test for orthogonality,
>  and return the transpose if the test passes.

Well, as it stands, since we have no integer linear algebra, inverses
are always floating-point. When you do that, you find that, for
example,
([[True,False],[True,True]]**(-1))**2 = [[1.,0.],[-2.,1.]]
but
([[True,False],[True,True]]**2)**(-1) = [[1.,0.],[-1.,1.]]

I am not aware of any algorithm for finding inverses, or even
determining which matrices are invertible, in the peculiar Boolean
arithmetic we use.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matrix multiply

2008-04-06 Thread Anne Archibald
On 06/04/2008, Alan G Isaac <[EMAIL PROTECTED]> wrote:

> Just checking:
>  it's important to me that this won't change
>  the behavior of boolean matrices, but I don't
>  see a test for this.  E.g., ::
>
> >>> import numpy as N
> >>> A = N.mat('1 0;1 1',dtype='bool')
> >>> A**2
> matrix([[ True, False],
> [ True,  True]], dtype=bool)

I have no desire to change the behaviour of boolean matrices, and I'll
write a test, but what is it supposed to do with them? Just produce
reduce(dot,[A]*n)? For zero it will give the identity, and for
negative powers some sort of floating-point inverse. Currently for
positive powers it should produce the right answer provided
multiplication is associative (which I think it is).

The inverse actually poses a problem: should it return (A**(-1))**n or
(A**n)**(-1)? (No, they're not the same for boolean matrices.)


Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matrix power (was: matrix multiply)

2008-04-05 Thread Anne Archibald
On 05/04/2008, Stéfan van der Walt <[EMAIL PROTECTED]> wrote:

>  Some discussion recently took place around raising a square matrices
>  to integer powers.  See ticket #601:
>
>  http://scipy.org/scipy/numpy/ticket/601
>
>  Anne Archibald wrote a patch which factored 'matrix_multiply' out of
>  defmatrix (the matrix power implemented for the Matrix class).  After
>  some further discussion on irc, and some careful footwork so that
>  everything imports correctly, I checked in
>
>  http://projects.scipy.org/scipy/numpy/changeset/4968
>
>  The matrix_multiply method is exposed as numpy.linalg.matrix_multiply,
>  and is not in the top-level numpy namespace (it's a bit crowded there
>  already).  I'd be glad if you would review the changeset and comment.

Just to be clear, the function is called matrix_power, not matrix_multiply.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Final push for NumPy 1.0.5 (I need your help!)

2008-04-05 Thread Anne Archibald
On 05/04/2008, Charles R Harris <[EMAIL PROTECTED]> wrote:

> On Sat, Apr 5, 2008 at 4:10 PM, Anne Archibald <[EMAIL PROTECTED]>
> wrote:
> > More generally, my local working copy is now rater divergent from the
> > upstream. What's the recommended way to deal with this? Make sure I
> > have all the patches submitted to trac, then just revert to raw SVN?
> > So far I've just been editing the output of svn diff down to only the
> > bit that's relevant to each patch, but it's getting pretty long.
>
> Do you have commit access now? You could pull a fresh copy of svn, apply
> your patches one by one, and commit each time. That way you can avoid
> conflicts, which can be a pain when your local copy has diverged a lot from
> upstream. Lots of small commented changes are also preferable to one big
> patch.

No, or at least, I don't know a username/password to use. I also don't
seem to be able to change the status of bugs on the trac.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Final push for NumPy 1.0.5 (I need your help!)

2008-04-05 Thread Anne Archibald
On 05/04/2008, James Philbin <[EMAIL PROTECTED]> wrote:
> I've posted patches for:
>
>  #630: If float('123.45') works, so should numpy.float32('123.45')
>  #581: random.set_state does not reset state of random.standard_normal

Patches for #601, #622, #692, #696, #717 now in trac; I'd like to do
something about #720 (variance of complex arrays needs docs and
tests), but any patch for it is necessarily going to tread on the toes
of my patch for #696 (ddof parameter to var needs tests). I'm not
quite sure what the best way to handle this sort of thing is.

More generally, my local working copy is now rater divergent from the
upstream. What's the recommended way to deal with this? Make sure I
have all the patches submitted to trac, then just revert to raw SVN?
So far I've just been editing the output of svn diff down to only the
bit that's relevant to each patch, but it's getting pretty long.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Ticket #605 Incorrect behavior of numpy.histogram

2008-04-05 Thread Anne Archibald
On 05/04/2008, Bruce Southey <[EMAIL PROTECTED]> wrote:

>  1) Should the first bin contain all values less than or equal to the
>  value of the first limit and the last bin contain all values greater
>  than the value of the last limit?
>  This produced the counts as: array([3, 3, 9]) (I termed this
>  'Accumulate' in the output).
>
>  2) Should any values outside than the range of the bins be excluded?
>  That is remove any value that is smaller than the lowest value of the
>  bin and higher than the highest value of the bin.
>  This produced the counts as: array([2, 3, 4]) (I termed this 'Exclude'
>  in the output)
>
>  3) Should there be extra bins for these values?
>  While I did not implement this option, it would provide the counts as:
>  array([1,2,3,4,5])

There's also a fourth option - raise an exception if any points are
outside the range.

I hope this is a question about defaults - really what I would most
want is to have the choice, as a keyword option. For the default, I
would be tempted to go with option 4, raising an exception. This seems
pretty much guaranteed not to produce surprising results: if there's
any question about what the results should be it produces an error,
and the user can run it again with specific instructions. Plus in many
contexts having any points that don't belong in any bin is the result
of a programming error.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Simple financial functions for NumPy

2008-04-04 Thread Anne Archibald
On 04/04/2008, Alan G Isaac <[EMAIL PROTECTED]> wrote:
> On Fri, 4 Apr 2008, Gael Varoquaux apparently wrote:
>  > I really thing numpy should be as thin as possible, so
>  > that you can really say that it is only an array
>  > manipulation package. This will also make it easier to
>  > sell as a core package for developpers who do not care
>  > about "calculator" features.
>
>
> I'm a user rather than a developer, but I wonder:
>  is this true?
>
>  1. Even as a user, I agree that what I really want from
>  NumPy is a core array manipulation package (including
>  matrices).  BUT as long as this is the core of NumPy,
>  will a developer care if other features are available?
>
>  2. Even if the answer to 1. is yes, could the
>  build/installation process include an option not to
>  build/install anything but the core array functionality?
>
>  3. It seems to me that pushing things out into SciPy remains
>  a problem: a basic NumPy is easy to build on any platform,
>  but SciPy still seems to generate many questions.
>
>  4. One reason this keeps coming up is that he NumPy/SciPy
>  split is rather too crude.  If the split were instead
>  something like NumPy/SciPyBasic/SciPyFull/SciPyFull+Kits
>  where SciPyBasic contained only pure Python code (no
>  extensions), perhaps the desired location would be more
>  obvious and some of this recurrent discussion would go away.

It seems to me that there are two separate issues people are talking
about when they talk about packaging:

* How should functions be arranged in the namespaces? numpy.foo(),
scipy.foo(), numpy.lib.financial.foo(), scikits.foo(),
numkitfull.foo()?

* Which code should be distributed together? Should scipy require
separate downloading and compilation from numpy?

The two questions are not completely independent - it would be
horribly confusing to have the set of functions available in a given
namespace depend on which packages you had installed - but for the
most part it's not a problem to have several toplevel namespaces in
one package (python's library is the biggest example of this I know
of).

For the first question, there's definitely a question about how much
should be done with namespaces and how much with documentation. The
second is a different story.

Personally, I would prefer if numpy and scipy were distributed
together, preferably with matplotlib. Then anybody who used numpy
would have available all the scpy tools and all the plotting tools; I
think it would cut down on wheel reinvention and make application
development easier. Teachers would not need to restrict themselves to
using only functions built into numpy for fear that their students
might not have scipy installed - how many students have learned to
save their arrays in unportable binary formats because their teacher
didn't want them to have to install scipy?

I realize that this poses technical problems. For me installing scipy
is just a matter of clicking on a checkbox and installing a 30 MB
package, but I realize that some platforms make this much more
difficult. If we can't just bundle the two, fine. But I think it is
mad to consider subdividing further if we don't have to.

I think python's success is due in part to its "batteries included"
library. The fact that you can just write a short python script with
no extra dependencies that can download files from the Web, parse XML,
manage subprocesses, and save persistent objects makes development
much faster than if you had to forever decide between adding
dependencies and reinventing the wheel. I think numpy and scipy should
follow the same principle, of coming "batteries included".

So in this specific case, yes, I think the financial functions should
absolutely be included; whether they should be included in scipy or
numpy is less important to me because I think everyone should install
both packages.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Final push for NumPy 1.0.5 (I need your help!)

2008-04-04 Thread Anne Archibald
On 04/04/2008, Travis E. Oliphant <[EMAIL PROTECTED]> wrote:

> Hey Anne,
>
>  Do you currently have SVN access?   Would you like it?
>
>  I think the SciPy/NumPy sprint would be a good time to clean-up the
>  committers list and add new people interested in helping.

I don't have SVN access. I'd be happy (and careful!) to have it, but I
should warn you that I won't have time to do serious, regular
development on scipy/numpy; I do hope to be able to write a little
code here and there, though, and it would be handy to be able to add
it directly instead of sending patches into the ether.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Final push for NumPy 1.0.5 (I need your help!)

2008-04-04 Thread Anne Archibald
On 04/04/2008, Jarrod Millman <[EMAIL PROTECTED]> wrote:

> Since I sent my email last night another 5+ tickets have been closed.
>  If we keep going at this rate, we should be able to release 1.0.5 next
>  Friday (4/11) with every ticket closed.  Specifically, thanks to
>  Travis Oliphant, David Huard, and Stefan van der Walt for bug
>  squashing.
>
>  If anyone has some time, could you please check David's fix for the
>  ticket "loadtxt fails with record arrays":
>  http://projects.scipy.org/scipy/numpy/ticket/623
>  His fix looks correct to me (and even includes test!!); if someone
>  else can confirm this, David or I will be able to close this ticket.
>
>  There are now only 21 remaining open tickets.  Please take a look over
>
> the open tickets and see if there is anything you can quickly close:
>  
> http://scipy.org/scipy/numpy/query?status=new&status=assigned&status=reopened&milestone=1.0.5

I've submitted patches for two (matrix_power and condition number
functions) but I don't think I can actually manipulate the bug status
in any way.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-user] conforming to Python GIL...

2008-04-03 Thread Anne Archibald
On 03/04/2008, Travis E. Oliphant <[EMAIL PROTECTED]> wrote:
> fred wrote:
>  > Hi,
>  >
>  > I use a lot of ConVeX OPTimsation and fortran (via f2py) routines in my
>  > Traits app.
>  >
>  > As I want to compute the data and want to display them, I use threads.
>  >
>  > The issue I get is that data displayed (using Chaco2) are not updated
>  > (app is frozen) while computing the input data.
>  >
>  >  From D. Morrill's answer (the Traits guru ;-)), it appears that cvxopt
>  > (and solve() from scipy, in fact) and fortran modules does not release
>  > the "Python GIL (Global Intepreter Lock)".
>  >
>
> This requires a bit of effort to solve.   We need to in multiple places...
>
>  1) release the GIL
>  2) put semaphores into code that is not thread-safe.
>
>  f2py should be modified to handle this.Other could should be
>  modified to handle it as well.
>
>  I suspect NumPy should grow an API to handle the semaphore thing easily
>  (I think Python already has one), so these may be able to be wrapped
>  into the MACROS already available for releasing the GIL.

What is the story on f2py's releasing of the GIL? I had understood
that wrapper code was able to declare the wrapped code to be
thread-safe, so that the GIL was released while it executed. Is the
problem here that cvxopt is not threadsafe? I guess the best solution
then is to make a cvxopt- (and related routines)-specific lock?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread Anne Archibald
On 23/03/2008, David Cournapeau <[EMAIL PROTECTED]> wrote:
> Gnata Xavier wrote:
>  >
>  > Hi,
>  >
>  > I have a very limited knowledge of openmp but please consider this
>  > testcase :
>  >
>  >
>
>
> Honestly, if it was that simple, it would already have been done for a
>  long time. The problem is that your test-case is not even remotely close
>  to how things have to be done in numpy.

Actually, there are a few places where a parallel for would serve to
accelerate all ufuncs. There are build issues, yes, though they are
mild; we would also want to provide some API to turn parallelization
on and off, and we'd have to verify that OpenMP did not slow down
small arrays, but that would be it. (And I suspect that OpenMP is
smart enough to use single threads without locking when multiple
threads won't help. Certainly all the information is available to
OpenMP to make such decisions.)

This is why I suggested making this change: it should be a low-cost,
high-gain change.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to set array values based on a condition?

2008-03-22 Thread Anne Archibald
On 23/03/2008, Damian Eads <[EMAIL PROTECTED]> wrote:
> Hi,
>
>  I am working on a memory-intensive experiment with very large arrays so
>  I must be careful when allocating memory. Numpy already supports a
>  number of in-place operations (+=, *=) making the task much more
>  manageable. However, it is not obvious to me out I set values based on a
>  very simple condition.
>
>  The expression
>
>y[y<0]=-1
>
>  generates a binary index mask y>=0 of the same size as the array y,
>  which is problematic when y is quite large.
>
>  I was wondering if there was anything like a set_where(A, cmp, B,
>  setval, [optional elseval]) function where cmp would be a comparison
>  operator expressed as a string.
>
>  The code below illustrates what I want to do. Admittedly, it needs to be
>  cleaned up but it's a proof of concept. Does numpy provide any functions
>  that support the functionality of the code below?

That's a good question, but I'm pretty sure it doesn't, apart from
numpy.clip(). The way I'd try to solve that problem would be with the
dreaded for loop. Don't iterate over single elements, but if you have
a gargantuan array, working in chunks of ten thousand (or whatever)
won't have too much overhead:

block = 10
for n in arange(0,len(y),block):
yc = y[n:n+block]
yc[yc<0] = -1

It's a bit of a pain, but working with arrays that nearly fill RAM
*is* a pain, as I'm sure you are all too aware by now.

You might look into numexpr, this is the sort of thing it does (though
I've never used it and can't say whether it can do this).

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Anne Archibald
On 22/03/2008, Travis E. Oliphant <[EMAIL PROTECTED]> wrote:
> James Philbin wrote:
>  > Personally, I think that the time would be better spent optimizing
>  > routines for single-threaded code and relying on BLAS and LAPACK
>  > libraries to use multiple cores for more complex calculations. In
>  > particular, doing some basic loop unrolling and SSE versions of the
>  > ufuncs would be beneficial. I have some experience writing SSE code
>  > using intrinsics and would be happy to give it a shot if people tell
>  > me what functions I should focus on.
>
> Fabulous!   This is on my Project List of todo items for NumPy.  See
>  http://projects.scipy.org/scipy/numpy/wiki/ProjectIdeas I should spend
>  some time refactoring the ufunc loops so that the templating does not
>  get in the way of doing this on a case by case basis.
>
>  1) You should focus on the math operations:  add, subtract, multiply,
>  divide, and so forth.
>  2) Then for "combined operations" we should expose the functionality at
>  a high-level.  So, that somebody could write code to take advantage of it.
>
>  It would be easiest to use intrinsics which would then work for AMD,
>  Intel, on multiple compilers.

I think even heavier use of code generation would be a good idea here.
There are so many different versions of each loop, and the fastest way
to run each one is going to be different for different versions and
different platforms, that a routine that assembled the code from
chunks and picked the fastest combination for each instance might make
a big difference - this is roughly what FFTW and ATLAS do.

There are also some optimizations to be made at a higher level that
might give these optimizations more traction. For example:

A = randn(100*100)
A.shape = (100,100)
A*A

There's no reason the multiply ufunc couldn't flatten A and use a
single unstrided loop to do the multiplication.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-22 Thread Anne Archibald
On 22/03/2008, Thomas Grill <[EMAIL PROTECTED]> wrote:

> I've experimented with branching the ufuncs into different constant
>  strides and aligned/unaligned cases to be able to use SSE using
>  compiler intrinsics.
>  I expected a considerable gain as i was using float32 with stride 1
>  most of the time.
>  However, profiling revealed that hardly anything was gained because of
>  1) non-alignment of the vectors this _could_ be handled by
>  shuffled loading of the values though
>  2) the fact that my application used relatively large vectors that
>  wouldn't fit into the CPU cache, hence the memory transfer slowed down
>  the CPU.
>
>  I found the latter to be a real showstopper for most of my experiments
>  with SIMD. It's especially a problem for numpy because smaller vectors
>  have a lot of Python/numpy overhead, and larger ones don't really
>  benefit due to cache exhaustion.

This particular issue can sometimes be reduced by clever use of the
prefetching intrinsics. I'm not totally sure it's going to help inside
most ufuncs, though, since the runtime is so dominated by memory
reads. In a program I was writing I had time to do a 128-point real
FFT in the time it took to load the next 64 floats...

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy's future (1.1 and beyond): which direction(s) ?

2008-03-21 Thread Anne Archibald
On 21/03/2008, Gnata Xavier <[EMAIL PROTECTED]> wrote:

>  Something like http://idlastro.gsfc.nasa.gov/idl_html_help/TOTAL.html
>  (Thread Pool Keywords) would be nice.
>  A "total like" function could be a great pathfinder a put threads into
>  numpy keeping the things as simple as they should remain.
>  Not sure we need that is numpy in 1.1 but IMHO we need that in a near
>  future (because every "array oriented" libs are now threaded).

There was some discussion of this recently. The most direct approach
to the problem is to annotate some or all of numpy's inner C loops
with OpenMP constructs, then provide some python functions to control
the degree of parallelism OpenMP uses. This would transparently
provide parallelism for many numpy operations, including sum(),
numpy's version of IDL's total(). All that is needed is for someone to
implement it. Nobody has stepped forward yet.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Improving Docs on Wiki

2008-03-21 Thread Anne Archibald
On 21/03/2008, Stéfan van der Walt <[EMAIL PROTECTED]> wrote:
> On Fri, Mar 21, 2008 at 2:47 PM, Anne Archibald
>  <[EMAIL PROTECTED]> wrote:
>  >  Is it perhaps possible to make all numpy functions accessible in
>  >  submodules (in addition to in numpy, for backwards compatibility) and
>  >  then promote accessing them that way? Are they already? If so how do I
>  >  find out what the submodules are?
>
> We should definately discuss and consider this proposal for 1.1.  Do
>  you have a suggested organisation in mind?

Not exactly. What do people think of the way I organized the numpy
functions by category page? Apart from the sore-thumb "other"
category, it does seem like the kind of grouping we might hope for.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Improving Docs on Wiki

2008-03-21 Thread Anne Archibald
On 21/03/2008, Sebastian Haase <[EMAIL PROTECTED]> wrote:

> Comment:  I have read the module- or directory-name "core" many times
>  on this list, however: Who really knows where a given functions
>  belongs ?  Isn't that mostly only the numpy svn commiters ?
>  In other words, using only the python side of numpy,  someone (like
>  myself) would NOT know that sort is inside "core" !
>
>  Also: since >>> import numpy as N; N.sort  refers already to that same sort:
>  >>> N.core.sort
>  
>  >>> N.sort
>  
>
>  I would prefer not to require "core" sub-sub-page.
>  Instead, every name  that is accessible as N. should be
>  documented without extra sub-page.

I don't have a solution, but I would like to complain about numpy's
flat namespace. Perhaps we're stuck with it now, but it's very
difficult to find the right function. In scipy, I can find the right
numerical integration by importsing scipy.integrate and using tab
completion, But in numpy, everything is loaded into the base
namespace, so tab completion gets me an overwhelming 502
possibilities. That's why there's a "numpy functions by category" but
no "scipy functions by category" - scipy functions are already by
category.

Is it perhaps possible to make all numpy functions accessible in
submodules (in addition to in numpy, for backwards compatibility) and
then promote accessing them that way? Are they already? If so how do I
find out what the submodules are?

Thanks,
Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Inplace index suprise

2008-03-20 Thread Anne Archibald
On 20/03/2008, Gael Varoquaux <[EMAIL PROTECTED]> wrote:
> On Thu, Mar 20, 2008 at 06:17:44PM +, James Philbin wrote:
>  > Hi,
>
>  > >  This cannot work, because the inplace operation does not
>  > >  take place as a for loop.
>  > Well, this would be fine if I was assigning the values to tempories as
>  > you suggest. However, the operation should be performed inplace and
>  > this is what I don't understand - why is there no for loop? I think
>  > the semantics of these inplace indexed operations is intuitively quite
>  > clear and numpy doesn't follow this intuition. Would there be any
>  > interest in changing this behaviour in numpy?
>
>
> I think this is technicaly impossible from the way numpy works. This
>  breaks the numpy model that every operation is global on the array.

It is quite reasonable from a least-surprise point of view.
Unfortunately it can't really be done because of the way python
implements augmented assignments.  If I'm not mistaken, numpy's
histogram function can be used to accomplish this particular thing.

>  > >  This is a FAQ.
>  > Sorry if i'm rehashing old ground, but the closest thing I could find
>  > to a numpy faq is here: http://www.scipy.org/FAQ. There seems to be no
>  > mention of this issue there.
>
>
> Sorry if I came out harsh, I certainly didn't want to implie that you
>  were expecting something wrong, just that this thing was tricking people.

I added it to that FAQ, along with the explanation I got when I asked
the same question:
http://www.scipy.org/FAQ#head-1ed851e9aff803d41d3cded8657b2b15a888ebd5

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to build a series of arrays as I go?

2008-03-17 Thread Anne Archibald
On 17/03/2008, Alan G Isaac <[EMAIL PROTECTED]> wrote:
> > Alan suggested:
>
> >> 1. http://www.scipy.org/Numpy_Example_List_With_Doc
>
> On Mon, 17 Mar 2008, Chris Withers apparently wrote:
>
> > Yeah, read that, wood, trees, can't tell the...
>
> Oh, then you might want
> http://www.scipy.org/Tentative_NumPy_Tutorial
>  or the other stuff at
> http://www.scipy.org/Documentation
>  All in all, I've found the resources quite good.

Also, for the specific question of "how do I do X?" you can try
http://www.scipy.org/Numpy_Functions_by_Category

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy and OpenMP

2008-03-15 Thread Anne Archibald
On 15/03/2008, Damian Eads <[EMAIL PROTECTED]> wrote:
> Robert Kern wrote:
>  > Eric Jones tried to use multithreading to split the computation of
>  > ufuncs across CPUs. Ultimately, the overhead of locking and unlocking
>  > made it prohibitive for medium-sized arrays and only somewhat
>  > disappointing improvements in performance for quite large arrays. I'm
>  > not familiar enough with OpenMP to determine if this result would be
>  > applicable to it. If you would like to try, we can certainly give you
>  > pointers as to where to start.
>
> Perhaps I'm missing something. How is locking and synchronization an
>  issue when each thread is writing to a mutually exclusive part of the
>  output buffer?

The trick is to efficiently allocate these output buffers. If you
simply give each thread 1/n th of the job, if one CPU is otherwise
occupied it doubles your computation time. If you break the job into
many pieces and let threads grab them, you need to worry about locking
to keep two threads from grabbing the same piece of data. Plus,
depending on where things are in memory you can kill performance by
abusing the caches (maintaining cache consistency across CPUs can be a
challenge). Plus a certain amount of numpy code depends on order of
evaluation:

a[:-1] = 2*a[1:]

Correctly handling all this can take a lot of overhead, and require a
lot of knowledge about hardware. OpenMP tries to take care of some of
this in a way that's easy on the programmer.

To answer the OP's question, there is a relatively small number of C
inner loops that could be marked up with OpenMP #pragmas to cover most
matrix operations. Matrix linear algebra is a separate question, since
numpy/scipy prefers to use optimized third-party libraries - in these
cases one would need to use parallel linear algebra libraries (which
do exist, I think, and are plug-compatible). So parallelizing numpy is
probably feasible, and probably not too difficult, and would be
valuable. The biggest catch, I think, would be compilation issues - is
it possible to link an OpenMP-compiled shared library into a normal
executable?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] dimensions too large error

2008-03-14 Thread Anne Archibald
On 14/03/2008, Dinesh B Vadhia <[EMAIL PROTECTED]> wrote:

> For the following code:
>
> I = 18000
> J = 33000
> filename = 'ij.txt'
> A = scipy.asmatrix(numpy.empty((I,J), dtype=numpy.int))
> for line in open(filename, 'r'):
> etc.
>
> The following message appears:
>
> Traceback (most recent call last):
> File "C:\...\py", line 362, in 
> A= scipy.asmatrix(numpy.empty((I,J), dtype=numpy.int))
> ValueError: dimensions too large.
>
> Is there a limit to array/matrix dimension sizes?

Yes. On 32-bit machines the hardware makes it exceedingly difficult
for one process to access more than two or three gigabytes of RAM, so
numpy's strides and sizes are all 32-bit integers. As a result you
can't make arrays bigger than about 2 GB. If you need this I'm afraid
you pretty much need a 64-bit machine.

> Btw, for numpy array's, ascontiguousarray() is available to set aside
> contiguous memory.  Is there an equivalent for scipy matrix ie. an
> ascontiguousmatrix()?

That's not exactly what ascontiguousarray() is for. Normally if you
create a fresh numpy array it will be allocated as a single contiguous
block of memory, and the strides will be arranged in that special way
that numpy calls "contiguous". However, if you take a subarray of that
array - every second element, say - the resulting data is not
contiguous (in the sense that there are gaps between the elements).
Normally this is not a problem. But a few functions - mostly
handwritten C or FORTRAN code - needs contiguous arrays. The function
ascontiguousarray() will return the original array if it is contiguous
(and in C order), or make a copy if it isn't.

Numpy matrices are just a slight redefinition of the operators on an
array, so you can always convert an array to a matrix without copying.
Thus there's little cost (for large arrays) to just using
matrix(ascontiguousarray()) if you need matrices.

Good luck,
Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Array assignment problem

2008-03-11 Thread Anne Archibald
On 11/03/2008, Dinesh B Vadhia <[EMAIL PROTECTED]> wrote:

> Hello!  I'm reading a text file with two numbers in str format on each line.
>  The numbers are converted into integers.  Each integer is then assigned to
> a 2-dimensional array ij (see code below).  The problem is that neither of
> the array assignments work ie. both ij[index, 0] = r and ij[index, 1] = c
> are always 0 (zero).  I've checked r and c and both are integers (>=0).
>
> import sys
> import os
> import numpy
>
> nnz = 120
> ij = numpy.array(numpy.empty((nnz, 2), dtype=int))
> index = 0
> filename = 'test_ij.txt'
> for line in open(filename, 'r'):
> line = line.rstrip('\n')
> r, c = map(str, line.split(','))
> r = int(r)
> c = int(c)
> ij[index, 0] = r
> ij[index, 1] = c
> index = index + 1
>
>
> What am I doing wrong?

The first thing you're doing wrong is you're not using numpy.loadtxt:
In [35]: numpy.loadtxt('foo',delimiter=",",dtype=numpy.int)
Out[35]:
array([[1, 3],
   [4, 5],
   [6, 6]])
This removes the need for the rest of your code. To find useful
functions like this in future, you can try looking at
http://www.scipy.org/Numpy_Functions_by_Category

Stripping the newline off is unnecessary, since int("  17  \n")==17.
Also, since the results of line.split(,) are already strings, the
map(str, ...) doesn't do anything. Did you mean it to?

Otherwise, your code works fine for me.

I should point out that using empty(), you should expect your array to
be full of gibberish (rather than 0), so if you're seeing lots of
zeros, they're probably coming from the text file.

Good luck,
Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Slice and assign into new NDarray...

2008-03-08 Thread Anne Archibald
On 08/03/2008, Vince Fulco <[EMAIL PROTECTED]> wrote:

>  I have an ND array with shape (10,15) and want to slice or subset(?) the data
>  into a new 2D array with the following criteria:
>
>  1) Separate each 5 observations along axis=0 (row) and transpose them to
>  the new array with shape (50,3)
>
>
>  Col1 Co2 Col3
>
>  Slice1 Slice2 Slice3
>  ...
>  ...
>  ...
>
>  Slice1 should have the coordinates[0:5,0], Slice2[0:5,1] and so
>  on...I've tried initializing the target ND array with
>
>  D = NP.zeros((50,3), dtype='int') and then assigning into it with
>  something like:
>
>  for s in xrange(original_array.shape[0]):
> D= NP.transpose([data[s,i:i+step] for i in range(0,data.shape[1], 
> step)])
>
>  with step = 5 but I get errors i.e. IndexError: invalid index
>
>  Also tried various combos of explicitly referencing D coordinates but
>  to no avail.

You're not going to get a slice - in the sense of a view on the same
underlying array, and through which you can modify the original array
- but this is perfectly possible without for loops.

First set up the array:
In [12]: a = N.arange(150)

In [13]: a = N.reshape(a, (-1,15))

You can check that the values are sensible. Now reshape it so that you
can split up slice1, slice2, and slice3:
In [14]: b = N.reshape(a, (-1, 3, 5))

slice1 is b[:,0,:]. Now we want to flatten the first and third
coordinates together. reshape() doesn't do that, exactly, but if we
swap the axes around we can use reshape to put them together:
In [15]: c = N.reshape(b.swapaxes(1,2),(-1,3))

This reshape necessarily involves copying the original array. You can
check that it gives you the value you want.

I recommend reading
http://www.scipy.org/Numpy_Functions_by_Category for all those times
you know what you want to do but can't find the function to make numpy
do it.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argmin & min on ndarrays

2008-03-04 Thread Anne Archibald
On 04/03/2008, Pierre GM <[EMAIL PROTECTED]> wrote:
> Anne,
>
>  Thanks a lot for your suggestion. Something like
>
>  >>>if axis is None:
>  >>>return b.flat[a.argmin()]
>  >>>else:
>  >>>   return numpy.choose(a.argmin(axis),numpy.rollaxis(b,axis,0))
>
>  seems to do the trick fairly nicely indeed. The other solutions you suggested
>  would require too much ad hoc adaptation.
>  Thanks again !

Ah! "It ain't the things you don't know that'll get you, it's the
things you know that ain't so." I thought rollaxis rolled the axes
around cyclically. This is much more useful, but what a funny name for
what it actually does...

I should have provided the link before, but this is very useful for
answering this kind of question:
http://www.scipy.org/Numpy_Functions_by_Category

Good luck,
Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argmin & min on ndarrays

2008-03-04 Thread Anne Archibald
On 04/03/2008, Pierre GM <[EMAIL PROTECTED]> wrote:
> All,
>  Let a & b be two ndarrays of the same shape. I'm trying to find the elements
>  of b that correspond to the minima of a along an arbitrary axis.
>  The problem is trivial when axis=None or when a.ndim=2, but I'm getting
>  confused with higher dimensions: I came to the following solution that looks
>  rather ugly, and I'd need some ideas to simplify it
>
>  >>>a=numpy.arange(24).reshape(2,3,4)
>  >>>axis=-1
>  >>>b = numpy.rollaxis(a,axis,0)[a.argmin(axis)][tuple([0]*(a.ndim-1))]
>  >>>numpy.all(b, a.min(axis))
>  True
>
>  Thanks a lot in advance for any suggestions.

I couldn't find any nice way to make indexing do what you want, but
the function choose() can be persuaded to do it. Unfortunately it will
only choose along the first axis, so some transpose jiggery-pokery is
necessary:

def pick_argmin(a,b,axis):
assert a.shape == b.shape
t = range(len(b.shape))
i = t[axis]
del t[axis]
t = [i] + t
a = a.transpose(t)
b = b.transpose(t)
return N.choose(N.argmin(a,axis=0),b)

I did find a not-nice way to do what you want. The problem is that
numpy's fancy indexing is so general, it won't let you simply pick and
choose along one axis, you have to pick and choose along all axes. So
what you do is use indices() to generate arrays that index all the
*other* axes appropriately, and then use the argmin array to index the
axis you're interested in:

In [39]: c = N.indices((2,4))

In [40]: b[c[0],N.argmin(a,axis=1),c[1]]
Out[40]:
array([[-0.70659942, -0.997249  , -0.20028296, -0.05171191],
   [-1.28886394, -1.0610526 , -1.07193295,  0.05356948]])

In [42]: c[0]
Out[42]:
array([[0, 0, 0, 0],
   [1, 1, 1, 1]])

In [43]: c[1]
Out[43]:
array([[0, 1, 2, 3],
   [0, 1, 2, 3]])

Not only would this require similar jiggery-pokery, it creates the
potentially very large intermediate array c. I'd stick with choose().

A third option would be to transpose() and reshape() a and b down to
two dimensions, then reshape() the result back to the right shape.
More multiaxis jiggery-pokery, and the reshape()s may end up copying
the arrays.

Finally, you can always just write a python loop (over all axes except
the one of interest) using ndenumerate() and one-dimensional argmin().
If the dimension you're argmin()ing over is very large, the cost of
the python loop may be negligible.

Anne
P.S. feel free to use pick_argmin however you like, though error
handling would probably be a good idea... -A
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pickling and initializing

2008-03-03 Thread Anne Archibald
On 03/03/2008, Dinesh B Vadhia <[EMAIL PROTECTED]> wrote:

> When you pickle a numpy/scipy matrix does it have to be initialized by
> another program?  For example:

Most python objects do not need to be initialized. You just call a
function that makes the one you want:

>>> l = range(10)

This makes a list of length 10. You can now manipulate it, adding
elements or what have you.

Arrays are no different. (You use matrices in your example - the only
difference is that the multiplication operator behaves differently,
and you often need to use asmatrix to convert them back to arrays. I
never use them, even when I have to do some linear algebra.) You
simply call a function that makes the array you want:

>>> a = arange(10)

You can change its contents, but it's not really sensible to say that
arrays must be initialized before use. The function empty() is kind of
a peculiar aberration - it's for those rare cases when you end up
reassigning all the values in the array, and zeros() is too slow. (For
debugging it'd be nice to have NaNs()...)

Perhaps you are thinking of statically-typed languages, where
variables must be initialized? In python variables do not have type,
so variables holding arrays are no different from variables holding
strings, integers, file objects, or whatever. Using a python variable
before it has been assigned a value does indeed raise an exception;
all that is needed is to assign a value to it.

Unpickling reads a file and constructs a "new" array from the data in
that file. The array value is returned; one often assigns this value
to a variable. The values in the array are filled in by the pickling
function. It is not possible to make the unpickler store its data in a
preallocated array.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.correlate with phase offset 1D data series

2008-03-03 Thread Anne Archibald
On 03/03/2008, Ray Schumacher <[EMAIL PROTECTED]> wrote:

>  Xie's 2D algorithm reduced to 1D works nicely for computing the
>  relative phase, but is it the fastest way? It might be, since some
>  correlation algorithms use FFTs as well. What does _correlateND use, in 
> scipy?

Which way will be the fastest really depends what you want to do.
Algorithmically, the direct way numpy.correlate operates is O(NM), and
the way FFT-based algorithms operate is (roughly) O((N+M)log(N+M)) (or
for a more sophisticated algorithm O(N log M) where M is less than N).
In practice what this means is that when one or both of the things
you're correlating is short (tens of samples or so), you should use a
direct method; when one or both are long you should use an FFT-based
method. (There are other approaches too, but I don't know of any in
wide use.)

In your case it sounds like you have two signals of equal fairly large
length to compare. Some questions remain, though:

* What do you want to happen at the endpoints? Without padding, only a
small interval (the difference in lengths plus one) is valid.
Zero-padding works, but guarantees a fall-off at the ends. Circular
correlation is easy to implement but not appropriate most of the time.

* Do you care about sub-sample alignment? How much accuracy do you
really need? Direct methods really can't give you this information.
With Fourier methods, you can easily pad the spectrum with zeros and
inverse FFT, giving you a beautifully-interpolated signal. If you want
more accuracy, a quadratic fit to the three points around the peak of
the interpolated signal will get you very close. If you need more
accuracy, you can use numerical maximization, evaluating each point as
sum(a_k exp(2 pi i k x)).

The other common application is to have a template (that presumably
falls to zero at its endpoint) and to want to compute a running
correlation against a stream of data. This too can be done both ways,
depending on the size of the template; all that is needed is to think
carefully about overlaps.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.correlate with phase offset 1D data series

2008-03-03 Thread Anne Archibald
On 03/03/2008, Ray Schumacher <[EMAIL PROTECTED]> wrote:
>
>  I'm trying to figure out what numpy.correlate does, and, what are people
> using to calculate the phase shift of 1D signals?

I use a hand-rolled Fourier-domain cross-correlation, but then, I'm
using a Fourier-domain representation of my signals.

>  (I coded on routine that uses rfft, conjugate, ratio, irfft, and argmax
> based on a paper by Hongjie Xie "An IDL/ENVI implementation of the FFT Based
> Algorithm for Automatic Image Registration" - but that seems more intensive
> than it could be.)

Sounds familiar. If you have a good signal-to-noise ratio, you can get
subpixel accuracy by oversampling the irfft, or better but slower, by
using numerical optimization to refine the peak you found with argmax.

>  In numpy, an identity import numpy from pylab import *
> l=[1,5,3,8,15,6,7,7,9,10,4] c=numpy.correlate(l,l, mode='same') plot(c)
> peaks at the center, x=5, and is symmetric

You have revealed several flaws in numpy's correlate. First of all,
the docstring gives no indication of how to interpret the result:
neither the zero-shift position nor the direction of the result is at
all clear (if I shift the first vector to the left, does the
correlation peak shift left or right?). Second, the mode "same" gives
results which are rather difficult to understand. Third, there is no
way to get a "circular" correlation.

I would be inclined to use convolve (or scipy.ndimage.convolve, which
uses a Fourier-domain method), since it is somewhat better specified.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] contiguous true

2008-02-29 Thread Anne Archibald
On 01/03/2008, Charles R Harris <[EMAIL PROTECTED]> wrote:

> > On Fri, Feb 29, 2008 at 10:53 AM, John Hunter <[EMAIL PROTECTED]> wrote:
> > > I have a boolean array and would like to find the lowest index "ind"
> > > where N contiguous elements are all True.  Eg, if x is

[...]

> Oops, ind = arange(len(x)). I suppose nonzero would work as well.

I'm guessing you're alluding to the fact that diff(nonzero(x)) gives
you a list of the run lengths of Falses in x (except possibly for the
first one).

If you have a fondness for the baroque, you can try

numpy.where(numpy.convolve(x,[1,]*N,'valid')==N)

For large N this can even use Fourier-domain convolution (though you'd
then have to be careful about round-off error).

Silly, really, it's O(NM) or O(N log M) instead of O(N).

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Optimize speed of for loop using numpy

2008-02-26 Thread Anne Archibald
On 25/02/2008, Trond Kristiansen <[EMAIL PROTECTED]> wrote:

>  I have attached the function that the FOR loop is part of as a python file.
>  What I am trying to do is to create a set of functions that will read the
>  output files (NetCDF) from running the ROMS model (ocean model). The output
>  file is organized in xi (x-direction), eta (y-direction), and s
>  (z-direction) where the s-values are vertical layers and not depth. This
>  particular function (z_slice) will find the closest upper and lower s-layer
>  for a given depth in meters (e.g. -100). Then values from the two selcted
>  layers will be interpolated to create a new layer at the selected depth
>  (-100). The problem is that the s-layers follow the bathymetry and a
>  particular s-layer will therefore sometimes be above and sometimes below the
>  selected depth that we want to interpolate to. That's why I need a quick
>  script that searches all of the layers and find the upper and lower layers
>  for a given depth value (which is negative). The z_r is a 3D array
>  (s,eta,xi) that is created using the netcdf module.

If I understand what you're doing correctly, you have a 3D array of
depths, indexed by two unproblematic coordinates (eta and xi), and by
a third coordinate whose values are peculiar.   For each pair (eta,
xi), you want to find the value of the third coordinate that occurs at
a depth closest to some given depth.

Are you looking to do this for many values of depth? It seems like
what you're trying to do is (approximately) invert a function - you
have z as a function of s, and you want s as a function of z. Is the
mapping between depths and s values monotonic?

First of all, numpy.searchsorted() is the right tool for finding where
depth occurs in a list of z values. It gives you the position of the
next lower value; it's pretty straightforward to write down a linear
interpolation (you should be able to do it without any for loop at
all) and more sophisticated interpolation schemes may be
straightforward as well.

If you want a smoother interpolation scheme, you may want to look at
scipy.interpolate. It is not always as vectorial as you might wish,
but it implements splines (optionally with smoothing) quite
efficiently. I believe scipy.ndimage may well have some suitable
interpolation code as well.

In my opinion, the value of numpy/scipy comes from the tools that
allow you to express major parts of your algorithm as a line or two of
code. But it is often difficult to take advantage of this by "trying
to accelerate for loops". I find it really pays to step back and look
at my algorithm and think how to express it as uniform operations on
arrays. (Of course it helps to have a rough idea of what operations
are available; the numpy documentation is sometimes sadly lacking in
terms of letting you know what tools are there.)

You might find useful the new
http://www.scipy.org/Numpy_Functions_by_Category

Good luck,
Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy 1:1.0.4: numpy.average() returns the wrong result with weights

2008-02-22 Thread Anne Archibald
On 22/02/2008, Travis E. Oliphant <[EMAIL PROTECTED]> wrote:

> Is there a ticket on the NumPy trac for this?  We won't see it if there
>  isn't.  Thanks for pointing us to the bug.

It appears to be fixed in SVN (that was quick!). But the Debian bug
report also points out a peculiar unnecessary use of eval; the code is
also slower and uses more memory than it has to. Attached is a patch
to cure that.

Anne
Index: numpy/lib/function_base.py
===
--- numpy/lib/function_base.py	(revision 4819)
+++ numpy/lib/function_base.py	(working copy)
@@ -378,9 +378,9 @@
 ni = len(ash)
 r = [newaxis]*ni
 r[axis] = slice(None, None, 1)
-w1 = eval("w["+repr(tuple(r))+"]*ones(ash, float)")
-n = add.reduce(a*w1, axis)
-d = add.reduce(w1, axis)
+r = tuple(r)
+n = add.reduce(a*w[r], axis)
+d = add.reduce(w, axis)
 else:
 raise ValueError, 'averaging weights have wrong shape'
 
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Matching 0-d arrays and NumPy scalars

2008-02-21 Thread Anne Archibald
On 21/02/2008, Stefan van der Walt <[EMAIL PROTECTED]> wrote:

>  Could I ask that we also consider implementing len() for 0-d arrays?
>  numpy.asarray returns those as-is, and I would like to be able to
>  handle them just as I do any other 1-dimensional array.  I don't know
>  if a length of 1 would be valid, given a shape of (), but there must
>  be some consistent way of handling them.

Well, if the length of an array is the product of all its sizes, the
product of no things is customarily defined to be one... whether that
is actually a useful value is another question.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proxy array class and units

2008-02-13 Thread Anne Archibald
On 13/02/2008, Dan Goodman
<[EMAIL PROTECTED]> wrote:

> Background: I'm writing a package to run simulations which make extensive use 
> of
> linear algebra, for which I'm using numpy. However - it is important to my
> package that quantities can have dimesions, so I've written a class Quantity
> subclassed from float which includes the physical dimensions and raises
> exceptions etc. if you try to add volts to kilograms or whatever. This seems 
> to
> work fine, but I also wanted arrays containing physical dimensions. I wrote a
> class qarray subclassed from numpy.ndarray which mostly just copies the
> functionality of an array with dtype=object. Here is the issue. The code for 
> my
> package internally doesn't use these qarray objects, it uses numpy arrays
> because it would obviously be terribly slow to be repeatedly checking if the
> dimensions were consistent all the time. However, I want to present a 
> (somewhat)
> consistent strict unit-checking front to the user of this package. As one part
> of this, I would like a class that appears to act exactly like a qarray (which
> itself is supposed to act very much like a numpy array) but maintains a
> connection with its numpy array counterpart in my package.

I'm not sure I understand all the different layers you're talking
about here. I think that I would attack your problem in the following
way:

Use ScientificPython's PhysicalQuantity (alone and in object arrays)
or "unum"s to represent quantities with units:
http://books.google.com/books?id=3nR75KSvsq4C&pg=PA169&lpg=PA169&dq=numpy+physicalquantity&source=web&ots=_cmXEC0qpx&sig=JBEz9BlegnIQJor-1BwnAdRTKLE

Object arrays are fairly horrible, but having one unit per array
shouldn't be very inefficient. It could cause some headaches, since
it's sometimes useful to have an array with mixed units (differential
equation solvers often require this when you convert a higher-order
problem to first-order, for example), but it should be quite efficient
computationally.

So I would pull together an array that held a collection of quantities
all in the same units. Type checking then becomes a once-per-array
operation, which is relatively cheap. The way I'd implement this is
(by preference) by grabbing a version off the net, or (if that fails)
as a subclass of ndarray. If I want to do some heavy-duty array
mangling without type checking getting in my way, I'll just convert to
an ndarray (either normalizing the units or saving them), do the heavy
lifting, and reapply the units afterwards.

I don't really see why you're talking about proxy classes... Perhaps
your problem is actually two problems: implementing unit arrays, and
doing something I haven't understood. Is there any reason not to
separate the two, and use one of the existing solutions for unit
arrays?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] sort method raises unexpected error with axis=None

2008-02-12 Thread Anne Archibald
On 12/02/2008, Anne Archibald <[EMAIL PROTECTED]> wrote:
> An efficient way to handle in-place (or out-of-place, come to think of
> it) median along multiple axes is actually to take medians along all
> axes in succession. That saves you some sorting effort, and some
> programming effort, and doesn't require in-place multidimensional
> sorting:

Aargh. Sorry. No, that doesn't work:

In [28]: all_axes_median(N.reshape([1,5,6,7],(2,2)))
Out[28]: 4.75

Oops.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] sort method raises unexpected error with axis=None

2008-02-12 Thread Anne Archibald
On 12/02/2008, Matthew Brett <[EMAIL PROTECTED]> wrote:

> Is it possible, in fact, to do an inplace sort on an array with
> axis=None (ie flat sort)?

It is, sometimes; just make an array object to point to the flattened
version and sort that:

In [16]: b = a[:]

In [17]: b.shape = (16,)

In [18]: b.sort()

This is not always possible, depending on the arrangement of a in memory.

An efficient way to handle in-place (or out-of-place, come to think of
it) median along multiple axes is actually to take medians along all
axes in succession. That saves you some sorting effort, and some
programming effort, and doesn't require in-place multidimensional
sorting:

In [24]: def all_axes_median(a):
   : if len(a.shape)>1:
   : return all_axes_median(N.median(a))
   : else:
   : return N.median(a)
   :
   :

In [26]: all_axes_median(N.reshape(N.arange(32),(2,4,2,-1)))
Out[26]: 15.5

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Median advice

2008-02-12 Thread Anne Archibald
On 12/02/2008, Matthew Brett <[EMAIL PROTECTED]> wrote:

> Suggestion 1:
> def median(a, axis=0, out=None)
[...]
> Suggestion 2:
> def median(a, axis=0, scratch_input=False)

No reason not to combine the two. It's a pretty straightforward
modification to do the sorting in place, and it could make a lot more
difference to the runtime (and memory usage) than an output array.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting contents of buffer for array object

2008-02-11 Thread Anne Archibald
On 11/02/2008, Matthew Brett <[EMAIL PROTECTED]> wrote:

> > I can also see that this could possibly be improved by using a for
> > loop to iterate over the output elements, so that there was no need to
> > duplicate the large input array, or perhaps a "blocked" iteration that
> > duplicated arrays of modest size would be better. But how can a single
> > float per data set whose median is being taken help?
>
> Sorry, you are right to call me on this very sloppy late-night
> phrasing - I only meant that it would be useful in due course to use a
> C implementation for median such as the ones you're describing, and
> that this could write the result directly into the in-place memory -
> in the same way that mean() does.  It's quite true that it's difficult
> to imagine the algorithm itself benefiting from the memory buffer.

My point was not to catch you in an error - goodness knows I make
enough of those, and not only late at night! - but to point out that
there may not really be much need for an output argument. Even with a
C code, for the median to be of much use, the output array can be at
most half the size of the input array. The extra storage space
required is not that big a concern, unlike a ufunc, and including an
output argument forces you to deal with all sorts of data conversion
issues.

On the other hand, there is something to be said for allowing the code
to destroy the input array. Perhaps *that* should be an optional
argument (defaulting to zero)?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting contents of buffer for array object

2008-02-10 Thread Anne Archibald
On 10/02/2008, Matthew Brett <[EMAIL PROTECTED]> wrote:
> > Ah, I see. You definitely do not want to reassign the .data buffer in
> > this case. An out= parameter does not reassign the memory location
> > that the array object points to. It should use the allocated memory
> > that was already there. It shouldn't "copy" anything at all;
> > otherwise, "median(x, out=out)" is no better than "out[:] =
> > median(x)". Personally, I don't think that a function should expose an
> > out= parameter unless if it can make good on that promise of memory
> > efficency.
>
> I agree - but there are more efficient median algorithms out there
> which can make use of the memory efficiently.  I wanted to establish
> the call signature to allow that.  I don't feel strongly about it
> though.

This is a startling claim! Are there really median algorithms that are
faster for having the use of a single float as storage space? If it
were permissible to mutilate the original array in-place, I can
certainly see a good median algorithm (based on quicksort, perhaps)
being faster, but modifying the input array is a different question
from using an output array.

I can also see that this could possibly be improved by using a for
loop to iterate over the output elements, so that there was no need to
duplicate the large input array, or perhaps a "blocked" iteration that
duplicated arrays of modest size would be better. But how can a single
float per data set whose median is being taken help?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug in numpy all() function

2008-02-06 Thread Anne Archibald
On 06/02/2008, Robert Kern <[EMAIL PROTECTED]> wrote:

> > I guess the all function doesn't know about generators?
>
> Yup. It works on arrays and things it can turn into arrays by calling the C 
> API
> equivalent of numpy.asarray(). There's a ton of magic and special cases in
> asarray() in order to interpret nested Python sequences as arrays. That magic
> works fairly well when we have sequences with known lengths; it fails utterly
> when given an arbitrary iterator of unknown length. So we punt. Unfortunately,
> what happens then is that asarray() sees an object that it can't interpret as 
> a
> sequence to turn into a real array, so it makes a rank-0 array with the 
> iterator
> object as the value. This evaluates to True.
>
> It's possible that asarray() should raise an exception for generators, but it
> would be a special case. We wouldn't be able to test for arbitrary iterables.

Would it be possible for asarray() to pull out the first element from
the iterable, make an array out of it, then assume that all other
values out of the iterable will have the same shape (raising an error,
of course, when they aren't)? I guess this has high foot-shooting
potential, but is it that much worse than numpy's shpe-guessing
generally?

It would be handy to be able to use an iterable to fill an array, so
that you'd never need to store the values in anything else first:

a = N.array((sin(N.pi*x/n) for x in xrange(n)))

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Stride of 2 for correlate()

2008-02-05 Thread Anne Archibald
On 05/02/2008, Chris Finley <[EMAIL PROTECTED]> wrote:

> After searching the archives, I was unable to find a good method for
> changing the stride of the correlate or convolve routines. I am doing a
> Daubechies analysis of some sample data, say data = arange(0:80). The
> coefficient array or four floats (say daub_g2[0:4]) is correlated over
> the data. However, the product only needs to be calculated for every
> other data index.

I don't think that correlate or convolve can be made to behave this
way. You can of course just throw away half the values, but I imagine
you'd like it to be reasonably fast (do compare, though, speed
tradeoffs can be surprising, particularly in python). This sort of
thing comes up from time to time, so I took some old code I posted to
scipy-user and put it in the cookbook at
http://www.scipy.org/Cookbook/SegmentAxis . It works by converting
your input array into a (in this case) 4 by n matrix with overlapping
rows (without copying). You can then do what you like with the
resulting array; convolutions just become matrix products. It's
conceivable (though frankly unlikely) that the BLAS acceleration of
matrix multiplication might make this run faster than correlate().

Good luck,
Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unexpected behavior with numpy array

2008-02-03 Thread Anne Archibald
On 03/02/2008, Damian Eads <[EMAIL PROTECTED]> wrote:
> Good day,
>
> Reversing a 1-dimensional array in numpy is simple,
>
> A = A[:,:,-1] .
>
> However A is a new array referring to the old one and is no longer
> contiguous.
>
> While trying to reverse an array in place and keep it contiguous, I
> encountered some weird behavior. The reason for keeping it contiguous is
> the array must be passed to an old C function I have, which expects the
> buffer to be in row major order and contiguous. I am using lots of
> memory so I want to minimize copying and allocation of new arrays.

The short answer is that reversing an array in-place requires a
certain amount of care, in C - you have to explicitly walk through
swapping element i and element n-i, using a temporary variable.
Getting numpy to exchange the elements in-place is going to be a real
pain. I suggest trying a copying method first, and only getting
fancier if it's too slow.

>  >>> A=numpy.arange(0,10)
>  >>> A
> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>  >>> A[::-1]
> array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
>  >>> A[:] = A[::-1]
>  >>> A
> array([9, 8, 7, 6, 5, 5, 6, 7, 8, 9])
>  >>>
>
> Is there any way to perform assignments of the following form
>
>   X[sliceI] = expression involving X
>   with dimensions that are
>   compatible with the left
>   hand side
>
> without causing the odd behavior I mentioned. If not, it might be
> helpful to throw an exception when both the LHS and the RHS of an
> assignment reference an array slice of the same variable?

This is, odd as it seems, intended behaviour. Sometimes it's really
useful to use the same array on the LHS and RHS: for example

A[:-1]=A[1:]

You just need to know how the operation is done under the hood (the
arrays are iterated over in index order). (Actually, I'm not totally
sure this is specified - under some circumstances numpy may iterate
over dimensions in an order based on stride and/or size; whether this
can affect the result of an operation like the above I'll have to
think about.) In any case, much code depends on this.

Tricks for getting the array backwards without copying... hmm. Well,
you might be able to fill in the array in an unconventional order:

A = N.arange(n-1,-1,-1)
A=A[::-1]
N.sin(A,A) # or whatever

Now the reversal of A is a perfectly normal C array (though numpy may
not realize this; don't trust the flags, check the strides and sizes).

Or you could just write a little C function inplace_reverse(A); if
you're already linking to C this shouldn't add too much complexity to
your project.

Or you could do it explicitly in numpy:
for i in xrange(n/2):
t = A[i]
A[i]=A[n-i]
A[n-i]=t
In the likely case that this is too slow, you can do the copying in
blocks, small enough that memory consumption is moderate but large
enough that the python overhead is not too much.

Finally, I realize that digging around in legacy code can be
miserable, but it is often not really very difficult to make a C
function handle strided data - the whole principle of numpy is that
compiled code really just needs to know the start address, data type,
and the spacing and length of an array along each dimension.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Can not update a submatrix

2008-01-30 Thread Anne Archibald
On 30/01/2008, Francesc Altet <[EMAIL PROTECTED]> wrote:
> A Wednesday 30 January 2008, Nadav Horesh escrigué:
> > In the following piece of code:
> > >>> import numpy as N
> > >>> R = N.arange(9).reshape(3,3)
> > >>> ax = [1,2]
> > >>> R
> >
> > array([[0, 1, 2],
> >[3, 4, 5],
> >[6, 7, 8]])
> >
> > >>> R[ax,:][:,ax] = 100
> > >>> R
> >
> > array([[0, 1, 2],
> >[3, 4, 5],
> >[6, 7, 8]])
> >
> > Why R is not updated?
>
> Because R[ax] is not a view of R, but another copy of the original
> object (fancy indexing does return references to different objects).
> In order to get views, you must specify only a slice of the original
> array.  For example:

This is not exactly correct.

There are two kinds of fancy indexing in numpy, which behave
similarly. There is normal fancy indexing, which produces a new array,
not sharing data with the old array:

a = N.random.normal(size=10)
b = a[a>0]

There is also fancy indexing of lvalues (to use a C term):

a = N.random.normal(size=10)
a[a<0] *= -1

Here no copy is made; instead, the data is modified in-place. This
requires some clever hacks under the hood, since a[a<0] *can't* be a
view of a.

The problem the OP had is that they are going beyond what the clever
hacks can deal with. That's why

R[ax,:] = 100

works but

R[ax,:][:,ax] = 100

doesn't. The problem is that you have two indexing operations in the
second situation, and the inplace operators can't deal with that.

Solutions include working on a flattened version of the array (for
which a single indexing operation suffices), using the (not in-place)
where() construct, or using the ix_ function:

R[N.ix_(ax,ax)] = 100

N.ix_ is necessary because R[ax,ax] doesn't do what I, for one, would
have expected: R[[1,2],[3,4]] yields [R[1,3],R[2,4]], not
[[R[1,3],R[1,4]],[R[2,3],R[2,4]]]. But in any case, once you reduce it
to a single fancy-indexing operation, you can successfully use it as
an lvalue. Striding and views are not really the issue.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Vectorizing a function

2008-01-30 Thread Anne Archibald
On 30/01/2008, Scott Ransom <[EMAIL PROTECTED]> wrote:

> That works fine with arrays, scalars, or array/scalar mixes in the
> calling.  I do understand that more complicated functions might
> require vectorize(), however, I wonder if sometimes it is used
> when it doesn't need to be?

It certainly is! I would guess that it gets used for one of two reasons:
* People don't realize that it's just a for loop and assume it does
numpy magic to make things faster.
and
* People care more about convenience than speed.

I am usually (when I use it) in the second camp. I have some function
with a bunch of conditionals - patching up the phase of some inverse
trig function, say, or dealing with a few special cases, or cleanly
raising exceptions - and it matters more to get the thing written
quickly than to have it run quickly. So rather than stop and figure
out how to use where() or fancy indexing to handle it efficiently, I
just slap a vectorize() on a simple python function. I can always
accelerate it later. For the same reason, I often drop into list
comprehensions for a few lines, converting back to an array at the
end.

IMHO, one of the primary values of numpy/scipy is that it lets you
express vector algorithms conveniently. When this makes it faster,
this is a bonus; when it makes it slower (as sometimes happens with
very small or very large arrays) it's a pain. But vectorize() lets me
use these vector expressions very readily with functions that are
written in a simple python style.

Nevertheless, I think (going by some of the questions we get here) a
lot of people start using numpy without appreciating that the *point*
of using it is that you can use vector expressions; they think, "well,
I'm using numbers in python, so I should use numpy", and then they
don't take advantage of numpy's speedups. So perhaps we should do some
more educating, in tutorials and docstrings, about the limitations of
vectorize().

Anyone want to start a wiki page "How to get rid of vectorize()"?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Does float16 exist?

2008-01-08 Thread Anne Archibald
On 08/01/2008, Charles R Harris <[EMAIL PROTECTED]> wrote:

> Well, at a minimum people will want to read, write, print, and promote them.
> That would at least let people work with the numbers, and since my
> understanding is that the main virtue of the format is compactness for
> storage and communication, a basic need will be filled right there. One
> potential problem I see is handling +/-inf and nans, tests for these should
> probably be built into the type.

The el-cheapo solution to this simply provides two functions: take an
int16 array (which actually contains float16) and produce a float32
array, and vice versa. Then people do all their work in float32 (or
float64 is float32  doesn't have inf/nan, I don't remember) but can
read and write float16.

Of course it would be nicer to use flaot16 natively, more or less, but
without all the math that's going to be a frustrating experience.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Does float16 exist?

2008-01-08 Thread Anne Archibald
On 08/01/2008, Charles R Harris <[EMAIL PROTECTED]> wrote:

> I'm starting to get interested in implementing float16 support ;) My
> tentative program goes something like this:
>
> 1) Add the operators to the scalar type. This will give sorting, basic
> printing, addition, etc.
> 2) Add conversions to other types. This will allow promotion of data to
> currently supported types.
> 3) Unoptimized BLAS or ATLAS additions for the type. This would give array
> operations.
> 4) Figure out what the precedence relations are.
>
> Can you add some details to this roadmap? It also strikes me that this
> exercise can be repeated for quad precision floats, so might be a good
> preparation for future IEEE types like the proposed decimal floating types.
> Note that we are going to need some way to distinguish quads and extended
> precision types on 64bit platforms, as both will both register as float128
> in the current notation. Decimal floats can be added with the notation
> decimal64, etc.

How were you planning to implement the basic mathematics (addition,
multiplication, trig, what have you)? Were you planning to code all
that from scratch or is there a 16-bit float math library? Or were you
planning to make this work only on machines with 16-bit math hardware?
Or ship it out to a graphics card?

Without hardware support, people may be happier using 16-bit floats
only for on-disk storage...

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] CASTABLE flag

2008-01-07 Thread Anne Archibald
On 07/01/2008, Timothy Hochberg <[EMAIL PROTECTED]> wrote:

> I'm fairly dubious about assigning float to ints as is. First off it looks
> like a bug magnet to me due to accidentally assigning a floating point value
> to a target that one believes to be float but is in fact integer. Second,
> C-style rounding is pretty evil; it's not always consistent across
> platforms, so relying on it for anything other than truncating already
> integral values is asking for trouble.

I'm not sure I agree that this is a bug magnet: if your array is
integer and you think it's float, you're going to get burned sooner
rather than later, whether you assign to it or not. The only case I
can see would be a problem would be if zeros() and ones() created
integers - which they don't.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Nasty bug using pre-initialized arrays

2008-01-07 Thread Anne Archibald
On 07/01/2008, Charles R Harris <[EMAIL PROTECTED]> wrote:
>
> One place where Numpy differs from MatLab is the way memory is handled.
> MatLab is always generating new arrays, so for efficiency it is worth
> preallocating arrays and then filling in the parts. This is not the case in
> Numpy where lists can be used for things that grow and subarrays are views.
> Consequently, preallocating arrays in Numpy should be rare and used when
> either the values have to be generated explicitly, which is what you see
> when using the indexes in your first example. As to assignment between
> arrays, it is a mixed question. The problem again is memory usage. For large
> arrays, it makes since to do automatic conversions, as is also the case in
> functions taking output arrays, because the typecast can be pushed down into
> C where it is time and space efficient, whereas explicitly converting the
> array uses up temporary space. However, I can imagine an explicit typecast
> function, something like
>
> a[...] = typecast(b)
>
> that would replace the current behavior. I think the typecast function could
> be implemented by returning a view of b with a castable flag set to true,
> that should supply enough information for the assignment operator to do its
> job. This might be a good addition for Numpy 1.1.

This is introducing a fairly complex mechanism to cover, as far as I
can see, two cases: conversion of complex to real, and conversion of
float to integer. Conversion of complex to real can already be done
explicitly without a temporary:

a[...] = b.real

That leaves only conversion of float to integer.

Does this single case cause enough confusion to warrant an exception
and a way around it?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast iteration (I think I've got it)

2008-01-01 Thread Anne Archibald
On 01/01/2008, Neal Becker <[EMAIL PROTECTED]> wrote:
> This is a c-api question.
>
> I'm trying to get iterators that are both fast and reasonably general.  I
> did confirm that iterating using just the general PyArrayIterObject
> protocol is not as fast as using c-style pointers for contiguous arrays.

I'd like to point out that "contiguous" can be misleading as it is
used in numpy. An array is flagged contiguous if all the elements are
contiguous *and* they are arranged as a C-ordered multidimensional
array - i.e., the last index changes the fastest as you walk through
the array elements, and the next-to-last chenges the next fastest, and
so on.

> Please confirm if my understanding is correct.  There are 2 cases to
> consider.  There are plain old dense arrays, which will be contiguous, and
> there are array 'views' or 'slices'.  The views will not be contiguous.
> However, in all cases, the strides between elements in one dimension are
> constant.  This last point is key.  As long as the stride in the last
> dimension is a constant, a fast iterator is possible.
>
> I put together a small test, which seems to work for the cases I tried.
> The idea is to use the general iterator (via PyArray_IterAllButAxis), but
> iteration over the last dimension is done with a c-style pointer.

This is not an unreasonable approach, but it can suffer badly if the
last dimension is "small", for example if you have an array of shape
(1000,1000,2). Such arrays can arise for example from working with
complex numbers as pairs of real numbers.

Where is the acceleration coming from in this code? You can walk
through a multidimensional array in pure C, just incrementing pointers
as needed, without too much difficulty.

If you want to save bookkeeping overhead, you can partially flatten
the array: if your last dimension has stride 7 and length 11, and your
second-to-last dimension has stride 77, you can flatten (using
reshape, in python, to get a view) the last two dimensions of the
array and use a nice quick iterator on the combined dimension. For a
"C contiguous" array, this means you just walk through the whole array
with a single layer of looping.

I suspect that the place you will see big slowdowns is where data is
accessed in an order that doesn't make sense from a cache point of
view. If four-byte chunks of data are spaced every eight bytes, then
the processor spends twice as much time waiting for cache loads as if
they are spaced every four bytes. And most numpy tasks are almost
certainly memory-bound - all the ufuncs I can think of almost
certainly are (arctan of a double almost certainly takes less time
than loading and storing a double). If you walk through an array in
the wrong order - changing the big strides fastest and the small
strides slowest - you'll almost certainly have to shuffle the whole
array through cache eight times or more. This is even ignoring cache
misses. (To cut down on cache misses you can use the GCC prefetch
instructions, or god-knows-what equivalent in other compilers.)

I seem to recall that numpy already reorders its walks through arrays
within ufuncs - does it do so with cache in mind? For that matter, is
information on the CPU's caches available to numpy?

Anne

P.S. Here is code to flatten an array as much as possible without
changing the order of flatiter:

def flatteniter(A):
i=0
while ihttp://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] any better way to normalize a matrix

2007-12-28 Thread Anne Archibald
On 28/12/2007, Christopher Barker <[EMAIL PROTECTED]> wrote:
> Anne Archibald wrote:
> > Numpy provides ufuncs as general powerful tools for operating on
> > matrices. More can be added relatively easily, they provide not just
> > the basic "apply" operation but also "outer" and others. Adding
> > another way to accomplish the same operation just adds bulk to numpy.
>
> Maybe so, but if it were up to me, I'd have just the methods, and not
> the functions -- I like namespaces and OO design. I've always thought it
> was a spurious argument that these kinds of functions can operate on
> things that aren't numpy arrays -- it's true, but they all return
> arrays, to it's not like they really are universal.

It doesn't make a whole lot of sense to me to have n-ary methods
(though I don't think we currently have any ternary ufuncs). How would
you accomodate all the other operations ufuncs support?

The big problem with methods, it seems to me, is that if I want to add
a new ufunc, it's relatively easy. In fact, with vectorize() I do that
all the time (with all sorts of arities). But adding compiled ufuncs
in a new module is relatively straightforward; it's not at all clear
how a separate package could ever add new methods to the ndarray
object. This is not hypothetical - scipy.special introduces many
mathematical functions that either are or should be ufuncs.

As for whether this is "object-oriented", well, ufuncs are objects
with a variety of methods; one could productively inherit from them
(at least in principle, though I've never had occasion to).

Anyway, I don't know that we need to rehash the old discussion. I just
wanted to point out the value of ufuncs as objects.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] any better way to normalize a matrix

2007-12-28 Thread Anne Archibald
On 28/12/2007, Christopher Barker <[EMAIL PROTECTED]> wrote:

> I like the array methods a lot -- is there any particular reason there
> is no ndarray.abs(), or has it just not been added?

Here I have to disagree with you.

Numpy provides ufuncs as general powerful tools for operating on
matrices. More can be added relatively easily, they provide not just
the basic "apply" operation but also "outer" and others. Adding
another way to accomplish the same operation just adds bulk to numpy.
I don't even like myarray.max(), preferring numpy.amax() or
numpy.maximum.reduce.

I realize we can't remove any array methods, but I don't think we
should add an array method for every unary function - surely you don't
want to see myarray.arctan()?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] any better way to normalise a matrix

2007-12-27 Thread Anne Archibald
On 27/12/2007, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> in my code i am trying to normalise a matrix as below
>
> mymatrix=matrix(..# items are of double type..can be negative
> values)
> numrows,numcols=mymatrix.shape
>
> for i in range(numrows):
> temp=mymatrix[i].max()
> for j in range(numcols):
> mymatrix[i,j]=abs(mymatrix[i,j]/temp)
>
> # i am using abs() to make neg vals positive
>
> this way the code takes too much time to run for a case of
> numrows=25, and numcols=8100 etc..
> being a beginner in numpy and python ,i would like to know if this can
> be done more efficiently..can anyone advise?

Yes. A major advantage - really the reason for existence - of numpy is
that you can do operations to whole matrices at once in a single
instruction, without loops. This is convenient from a coding point of
view, and it also allows numpy to greatly accelerate calculations.
Writing code the way you have above takes no advantage of numpy's
machinery, and it would probably run faster using python lists than
numpy arrays. I strongly recommend you read some numpy documentation;
try starting with the tutorial:
http://www.scipy.org/Tentative_NumPy_Tutorial

For example, to extract an array containing the maxima of each row of
mymatrix, you can use the amax() function:

temp = numpy.amax(mymatrix, axis=1)

Multiplying a whole matrix by a constant can be done simply as well:

7*mymatrix

A final comment is that numpy is designed for computations with
multidimensional data. Its basic data type is the array. It has a
specialized data type called "matrix" which provides slightly more
convenient matrix multiplication (in the sense of linear algebra). I
recommend you do not use numpy's matrix class until you are more
accustomed to the software.

Read the documentation. Start with the tutorial. Good luck.
Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] RAdian <--> degres conversion

2007-12-16 Thread Anne Archibald
On 16/12/2007, Hans Meine <[EMAIL PROTECTED]> wrote:

> (*: It's similar with math.hypot, which I have got to know and appreciate
> nowadays.)

I'd like to point out that math.hypot is a nontrivial function which
is easy to get wrong:

In [6]: x=1e200; y=1e200;

In [7]: math.hypot(x,y)
Out[7]: 1.414213562373095e+200

In [8]: math.sqrt(x**2+y**2)
---
 Traceback (most recent call last)

/home/peridot/ in ()

: (34, 'Numerical result out of range')


Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] OT: A Way to Approximate and Compress a 3D Surface

2007-11-20 Thread Anne Archibald
On 20/11/2007, Geoffrey Zhu <[EMAIL PROTECTED]> wrote:

> I have N tabulated data points { (x_i, y_i, z_i) } that describes a 3D
> surface. The surface is pretty "smooth." However, the number of data
> points is too large to be stored and manipulated efficiently. To make
> it easier to deal with, I am looking for an easy method to compress
> and approximate the data. Maybe the approximation can be described by
> far fewer number of coefficients.
>
> If you can give me some hints about possible numpy or non-numpy
> solutions or let me know where is better to ask this kind of question,
> I would really appreciate it.

This is an important task in computer graphics, in particular, in the
field of multiresolution modelling. If you look up "surface
simplification" you'll find many references to articles. I don't know
of a library offhand that does it, let alone one that is accessible
from python, but you could try looking at a toolkit that does
isosurface visualization - these are surfaces that can often be
simplified enormously. In particular it looks like VTK might be able
to do what you want.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy : your experiences?

2007-11-16 Thread Anne Archibald
On 16/11/2007, Rahul Garg <[EMAIL PROTECTED]> wrote:

> It would be awesome if you guys could respond to some of the following
> questions :
> a) Can you guys tell me briefly about the kind of problems you are
> tackling with numpy and scipy?
> b) Have you ever felt that numpy/scipy was slow and had to switch to
> C/C++/Fortran?
> c) Do you use any form of parallel processing? Multicores? SMPs?
> Clusters? If yes how did u utilize them?
>
> If you feel its not relevant to the list .. feel free to email me personally.
> I would be very interested in talking about these issues.

I think it would be interesting and on-topic to hear a few words from
people to see what they do with numpy.

a) I use python/numpy/scipy to work with astronomical observations of
pulsars. This includes a range of tasks including: simple scripting to
manage jobs on our computation cluster; minor calculations (like a
better scientific calculator, though frink is sometimes better because
it keeps track of units); gathering and plotting results; prototyping
search algorithms and evaluating their statistical properties;
providing tools for manipulating photon data; various other tasks. We
also use the software package PRESTO for much of the heavy lifting;
much of it is written in python.

b) I have projects for which python is too slow, yes. Pulsar surveys
are extremely compute-intensive (I estimated one that I'm involved
with at two or three mega-core-hours), so any software that is going
in the pipeline should have its programming-time/runtime tradeoff
carefully examined. I wrote a search code in C, with bits in embedded
assembler. All the driver code to shuffle files around and supply
correct parameters is in python, though. PRESTO has a similar pattern,
with more C code because it does more of the hard work. In most cases
the communication between the heavy-lifting code and the python code
is through the UNIX environment (the heavy-lifting code gets run as a
separate executable) but PRESTO makes many functions available in
python modules. On the other hand, I often write quick Monte Carlo
simulations that run for a day or more, but since writing them takes
about as long as running them, it's not worth writing them in a
language that would run faster.

c) Our problems tend to be embarrassingly parallel, so we tend not to
use clever parallelism toolkits. For the survey I am working on, I
wrote one (python) script to process a single beam on a single node
(which takes about thirty hours), and another to keep the batch queue
filled with an appropriate number of such jobs. I have thought about
writing a more generic tool for managing this kind of job queue, but
haven't invested the time yet.



Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

2007-11-08 Thread Anne Archibald
On 08/11/2007, David Cournapeau <[EMAIL PROTECTED]> wrote:

> For copy and array creation, I understand this, but for element-wise
> operations (mean, min, and max), this is not enough to explain the
> difference, no ? For example, I can understand a 50 % or 100 % time
> increase for simple operations (by simple, I mean one element operation
> taking only a few CPU cycles), because of copies, but a 5 fold time
> increase seems too big, no (mayb a cache problem, though) ? Also, the
> fact that mean is slower than min/max for both cases (F vs C) seems a
> bit counterintuitive (maybe cache effects are involved somehow ?).

I have no doubt at all that cache effects are involved: for an int
array, each data element is four bytes, but typical CPUs need to load
64 bytes at a time into cache. If you read (or write) the rest of
those bytes in the next iterations through the loop, the (large) cost
of a memory read is amortized. If you jump to the next row of the
array, some large number of bytes away, those 64 bytes basically need
to be purged to make room for another 64 bytes, of which you'll use 4.
If you're reading from a FORTRAN-order array and writing to a C-order
one, there's no way around doing this on one end or another: you're
effectively doing a transpose, which is pretty much always expensive.

Is there any reason not to let ufuncs pick whichever order for
newly-allocated arrays they want? The natural choice would be the same
as the bigger input array, if it's a contiguous block of memory
(whether or not the contiguous flags are set). Failing that, the same
as the other input array (if any); failing that, C order's as good a
default as any. How difficult would this be to implement?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy FFT memory accumulation

2007-10-31 Thread Anne Archibald
On 31/10/2007, Ray S <[EMAIL PROTECTED]> wrote:
> I am using
> fftRes = abs(fft.rfft(data_array[end-2**15:end]))
> to do running analysis on streaming data. The N never changes.
> It sucks memory up at ~1MB/sec with 70kHz data rate and 290 ffts/sec.
> (Interestingly, Numeric FFT accumulates much slower..)
> (Commenting out that one line stops memory growth.)
>
> What can one do to alleviate this?
> Can I del() some numpy object or such?
> It's a bit of an issue for a program that needs to run for weeks.
>
> It's purpose is simply to argmax() the largest bin, which always
> falls within a small range - do I have another, better option than
> fft?

If the range is *really* small, you can try using a DFT - sometimes
that is fast enough, and gives you just the bins you're curious about.

If the range is bigger than that, but still a small fraction of the
FFT size, you can do some tricks where you band-pass filter the data
(with a FIR filter, say), downsample (which aliases frequencies
downward), and then FFT. Concretely, if your data is sampled at
80ksamp/s and you're taking 160ksamp FFTs, you're getting the spectrum
up to 35 kHz with a resolution of 0.5 Hz. If all you care about is the
spectral region from 5 kHz to 6 kHz but you still need the 0.5 Hz
spectral resolution, you can run a band-pass filter to throw out
everything outside (say) 4 to 7 kHz, then downsample to 8 ksamp/s; the
4 to 7kHz region will appear mirrored in the 1 to 4 kHz region of the
new FFTs, which will each be a tenth the size.

There are also "zoom fft" and "chirp-z" techniques which are supposed
to give you only part of the FFT, but the wisdom is that unless you
want less than a few percent of the data points you're better just
FFTing and throwing lots of data away.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fortran array storage question

2007-10-26 Thread Anne Archibald
On 26/10/2007, Travis E. Oliphant <[EMAIL PROTECTED]> wrote:

> There is an optimization where-in the inner-loops are done over the
> dimension with the smallest stride.
>
> What other cache-coherent optimizations do you recommend?

That sounds like a very good first step. I'm far from an expert on
this sort of thing, but here are a few ideas at random:
* internally flattening arrays when this doesn't affect the result
(e.g. ones((10,10))+ones((10,10)))
* prefetching memory: in a C application I recently wrote, explicitly
prefetching data for interpolation cut my runtime by 30%. This
includes telling the processor when you're done with data so it can be
purged from the cache.
* aligning (some) arrays to 8- 16- 32- or 64-byte boundaries so that
they divide nicely into cache lines
* using MMX/SSE instructions when available
* combining ufuncs so that computations can keep the CPU busy while it
waits for data to come in from main RAM (I realize that this is
properly the domain of numexpr)
* using ATLAS- or FFTW-style autotuning to determine the fastest ways
to structure computations (again more relevant for significant
expressions rather than simple ufuncs)
* reducing use of temporaries in the interest of reducing traffic to main memory
* openmp parallel operations when this actually speeds up calculation

I realize most of these are a lot of work, and some of them are
probably in numpy already. Moreover without using an expression parser
it's probably not feasible to implement others. But an array language
offers the possibility that a runtime can implement all sorts of
optimizations without effort on the user's part.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fortran array storage question

2007-10-26 Thread Anne Archibald
On 26/10/2007, Georg Holzmann <[EMAIL PROTECTED]> wrote:

> if in that example I also change the strides:
>
>int s = tmp->strides[1];
>tmp->strides[0] = s;
>tmp->strides[1] = s * dim0[0];
>
> Then I get in python the fortran-style array in right order.

This is the usual way. More or less, at least. numpy is designed from
the start to handle arrays with arbitrary striding; this is how slices
are implemented, for example. There will be no major performance hit
from numpy code itself. The actual organization of data in memory will
of course affect the speed at which your code runs. The flags, as you
discovered, are just a performance optimization, so that code that
needs arrays organized as C- or FORTRAN-standard doesn't need to check
the strides every time.

I don't think numpy's loops - for example in ones((100,100))+eye(100)
- are smart about doing operations in an order that makes
cache-coherent use of memory. The important exception is the loops
that use ATLAS, which I think is mostly the dot() function.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] appending extra items to arrays

2007-10-10 Thread Anne Archibald
On 11/10/2007, Robert Kern <[EMAIL PROTECTED]> wrote:

> Appending to a list then converting the list to an array is the most
> straightforward way to do it. If the performance of this isn't a problem, I
> recommend leaving it alone.

Just a speculation:

Python strings have a similar problem - they're immutable, and so are
even more resistant to growth than numpy arrays. For those situations
where you really really want to grow a srting, python provides
StringIO, where you keep efficiently adding to the string, then
finalize it to get the real string out. Would something analogous be
interesting for arrays?

Anne

P.S. A (somewhat) slow python implementation would be fairly easy to write. -A
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] bug in vectorize? (was: Re: Casting a float array into a string array)

2007-10-05 Thread Anne Archibald
On 05/10/2007, Christopher Barker <[EMAIL PROTECTED]> wrote:

> I don't know how to generalize this to n-d though -- maybe numpy.vectorize?

Oops! Looks like there's a big somewhere:

In [1]: from numpy import *

In [2]: vectorize(lambda x: "%5.3g" % x)(ones((2,2,2)))
Out[2]:
array([[[' ', '\xc1'],
[' ', '\xc1']],

   [[' ', '\xc1'],
[' ', '\xc1']]],
  dtype='|S1')

In [3]: vectorize?
Segmentation fault (core dumped)

In particular, vectorize creates a string array (rather than an object
array), and some unsafe operation is done on that array.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Casting a float array into a string array

2007-10-05 Thread Anne Archibald
On 05/10/2007, Matthieu Brucher <[EMAIL PROTECTED]> wrote:
> I'd like to have the '2.', because if the number is negative, only '-' is
> returned, not the real value.

For string arrays you need to specify the length of the string as part
of the data type (and it defaults to length 1):

In [11]: random.normal(size=(2,3)).astype("|S20")
Out[11]:
array([['-0.223407464804', '1.1952765837', '-2.24073805089'],
   ['1.36604738338', '-0.197059880309', '2.15648625075']],
  dtype='|S20')

In [12]: random.normal(size=(2,3)).astype("|S2")
Out[12]:
array([['-0', '1.', '-0'],
   ['0.', '0.', '1.']],
  dtype='|S2')

In [13]: random.normal(size=(2,3)).astype("str")
Out[13]:
array([['-', '1', '1'],
   ['-', '-', '0']],
  dtype='|S1')

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


<    1   2   3   4   5   >