Re: [Numpy-discussion] Topical documentation

2008-06-26 Thread Anne Archibald
2008/6/26 Stéfan van der Walt [EMAIL PROTECTED]:
 Hi all,

 We are busy writing several documents that address general NumPy
 topics, such as indexing, broadcasting, testing, etc.  I would like
 for those documents to be available inside of NumPy, so that they
 could be accessed as docstrings:

 help(np.doc.indexing)

 or

 In [15]: np.doc.indexing?

 If I had to implement it right now, I'd probably

 - Have a directory full of topic.txt files.
 - Upon importing the doc sub-module, setup doc.topic and add docstring
 accordingly.

 Does anyone have comments or suggestions regarding such a framework?

I think making them available inside np.doc is a good idea. Should
np.doc.indexing contain anything but the single docstring?

It's also worth making some of the examples work as doctests, both to
get better testing - some of the indexing operations don't work, or at
least, don't do anything I can understand - and to keep the examples
in syn with the code. To make this happen, might it not be better to
create doc/indexing.py, containing
__doc__=...?

The topical documentation would presumably also be ReST. Any further
conventions it should follow?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray methods vs numpy module functions

2008-06-24 Thread Anne Archibald
2008/6/24 Bob Dowling [EMAIL PROTECTED]:
 There is not supposed to be a one-to-one correspondence between the
 functions in numpy and the methods on an ndarray. There is some
 duplication between the two, but that is not a reason to make more
 duplication.

 I would make a plea for consistency, to start with.

 Those of us who write in an OO style are required to switch backwards
 and forwards between OO and not-OO, or to abandon OO altogether in our
 NumPy code.  Neither is an attractive option.

 The reason I tripped over this is that I am currently writing a course
 which introduces students to NumPy.  I am going to be asked this
 question from the audience.  As yet I don't have any answer except
 history.

As a rule, I personally avoid methods whenever reasonable. The primary
reason is that the function versions generally work fine on lists,
giving my functions some extra genericity for free. I generally make
an exception only for attributes - X.shape, for example, rather than
np.shape(X).

It's true, it's a mess, but I'd just set down some simple rules that
work, and mention that some functions exist in other forms. There's
something to be said for rationalizing numpy's rules - or at least
writing them down! - but there's no need to use every version of every
function. And I believe they're all accessible as module-level
functions.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Code samples in docstrings mistaken as doctests

2008-06-23 Thread Anne Archibald
2008/6/23 Alan McIntyre [EMAIL PROTECTED]:
 On Mon, Jun 23, 2008 at 3:21 PM, Stéfan van der Walt [EMAIL PROTECTED] 
 wrote:
 Another alternative is to replace +SKIP with something like +IGNORE.
 That way, the statement is still executed, we just don't care about
 its outcome.  If we skip the line entirely, it often affects the rest
 of the tests later on.

 Ugh.  That just seems like a lot of unreadable ugliness to me.  If
 this comment magic is the only way to make that stuff execute properly
 under doctest, I think I'd rather just skip it in favor of clean,
 uncluttered, non-doctestable code samples in the docstrings.  If the
 code that's currently in docstrings needs to be retained as test code,
 I'll gladly take the time to put it into a test_ module where it
 doesn't get in the way of documentation.  I'll defer to the consensus,
 though.

I think doctests are valuable: it's very hard for the documentation to
get out of sync with the code, and it makes it very easy to write
tests, particularly in light of the wiki documentation framework. But
I think encrusting examples with weird comments will be a pain for
documentors and off-putting to users. Perhaps doctests can be
positively marked, in some relatively unobtrusive way?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Code samples in docstrings mistaken as doctests

2008-06-23 Thread Anne Archibald
2008/6/23 Michael McNeil Forbes [EMAIL PROTECTED]:
 On 23 Jun 2008, at 12:37 PM, Alan McIntyre wrote:
 Ugh.  That just seems like a lot of unreadable ugliness to me.  If
 this comment magic is the only way to make that stuff execute
 properly
 under doctest, I think I'd rather just skip it in favor of clean,
 uncluttered, non-doctestable code samples in the docstrings.

 Another perspective: doctests ensure that documentation stays up to
 date (if the behaviour or interface changes, then tests will fail
 indicating that the documentation also needs to be updated.)

 Thus, one can argue that all examples should also be doctests.  This
 generally makes things a little more ugly, but much less ambiguous.

This is a bit awkward. How do you give an example for a random-number
generator? Even if you are willing to include a seed in each
statement, misleading users into thinking it's necessary, the value
returned for a given seed is not necessarily part of the interface a
random-number generator agrees to support.

I do agree that as many examples as possible should be doctests, but I
don't think we should restrict the examples we are allowed to give to
only those that can be made to serve as doctests.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Code samples in docstrings mistaken as doctests

2008-06-23 Thread Anne Archibald
2008/6/23 Stéfan van der Walt [EMAIL PROTECTED]:
 2008/6/24 Stéfan van der Walt [EMAIL PROTECTED]:
 It should be fairly easy to execute the example code, just to make
 sure it runs.  We can always work out a scheme to test its validity
 later.

 Mike Hansen just explained to me that the Sage doctest system sets the
 random seed before executing each test.  If we address

 a) Random variables
 b) Plotting representations and
 c) Endianness

 we're probably halfway there.

I agree (though I have reservations about how they are to be
addressed). But in the current setting, halfway there is still a
problem - it seems to me we need, now and later, a way to deal with
generic examples that are not doctests. There may not be many of them,
and most may be dealt with by falling into categories a, b, and c
above, but it is important that we not make it difficult to write new
examples even if they can't readily be made into doctests. In
particular, we don't want some documentor saying well, I'd like to
write an example, but I don't remember the arcane syntax to prevent
this failing a doctest, so I'm not going to bother.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Code samples in docstrings mistaken as doctests

2008-06-23 Thread Anne Archibald
2008/6/23 Michael Abshoff [EMAIL PROTECTED]:
 Charles R Harris wrote:


 On Mon, Jun 23, 2008 at 5:58 PM, Michael Abshoff
 [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
 wrote:

 Stéfan van der Walt wrote:
   2008/6/24 Stéfan van der Walt [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED]:
   It should be fairly easy to execute the example code, just to make
   sure it runs.  We can always work out a scheme to test its validity
   later.

 Hi,

   Mike Hansen just explained to me that the Sage doctest system
 sets the
   random seed before executing each test.  If we address
  
   a) Random variables

 we have some small extensions to the doctesting framework that allow us
 to mark doctests as #random so that the result it not checked. Carl
 Witty wrote some code that makes the random number generator in a lot of
 the Sage components behave consistently on all supported platforms.

 Hi,


 But there is more than one possible random number generator. If you do
 that you are tied into one kind of generator and one kind of
 initialization implementation.

 Chuck


 Correct, but so far Carl has hooked into six out of the many random
 number generators in the various components of Sage. This way we can set
 a global seed and also more easily reproduce issues with algorithms
 where randomness plays a role without being forced to be on the same
 platform. There are still doctests in Sage where the randomness comes
 from sources not in randgen (Carl's code), but sooner or later we will
 get around to all of them.

Doesn't this mean you can't change your implementation of random
number generators (for example choosing a different implementation of
generation of normally-distributed random numbers, or replacing the
Mersenne Twister) without causing countless doctests to fail
meaninglessly?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On my Cython/NumPy project

2008-06-21 Thread Anne Archibald
2008/6/21 Dag Sverre Seljebotn [EMAIL PROTECTED]:
 Dag wrote:
 General feedback is welcome; in particular, I need more opinions about
 what syntax people would like. We seem unable to find something that we
 really like; this is the current best candidate (cdef is the way you
 declare types on variables in Cython):

 cdef int i = 4, j = 6
 cdef np.ndarray[np.float64, 2] arr = np.zeros((10, 10), dtype=np.float64)
 arr[i, j] = 1
 ...


 Some more important points:
 - There will likely be a compiler flag (and perhaps per-block or
 per-variable pragmas) on whether bounds-checking is done or not)
 - It doesn't look like negative indices will be supported in this mode, as
 it adds another branch in potentially very tight loops. Opinions on this
 are welcome though. In safe mode, bounds checking will catch it, while in
 unsafe mode it will at best segfault and at worst corrupt data.

 The negative indices thing is potentially confusing to new users. Pex uses
 a different syntax (arr{i, j} for efficient array lookups) partly to make
 this fact very explicit. Thoughts?

I am very worried about the negative numbers issue. It's the sort of
thing that will readily lead to errors, and that produces a
significant difference between cython and python. I understand the
performance issues that motivate it, but cython really needs to be
easy to use or we might as well just use C. As I understand it, the
ultimate goal should be for users to be able to compile arbitrary
python code for tiny speed improvements, then add type annotations for
key variables to obtain large speed improvements. Forbidding negative
indices now prevents that goal.

My suggestion is this: allow negative indices, accepting the cost in
tight loops. (If bounds checking is enabled, the cost will be
negligible anyway.) Provide a #pragma allowing the user to assert that
a certain piece of code uses no negative indices. Ultimately, the
cython compiler will often be able to deduce than negative indices
will not be used within a particular loop and supply the pragma
itself, but in the meantime this solution allows people to use the
extremely-convenient negative indices without causing mysterious
problems, and it sets things up for the future. (Other compiler-based
cleverness can deal with the most common case by converting explicit
negative indices.)



Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On my Cython/NumPy project

2008-06-21 Thread Anne Archibald
2008/6/21 Robert Kern [EMAIL PROTECTED]:
 On Sat, Jun 21, 2008 at 17:08, Anne Archibald [EMAIL PROTECTED] wrote:
 My suggestion is this: allow negative indices, accepting the cost in
 tight loops. (If bounds checking is enabled, the cost will be
 negligible anyway.) Provide a #pragma allowing the user to assert that
 a certain piece of code uses no negative indices.

 Instead of a #pragma, you could rely on the type of the index. If it
 is unsigned, you can do the fast path; if it is signed, you need to
 check for and handle potential negatives.

Cute! And then it's easy to make for i in range(n) produce unsigned
results with no effort on the part of the user.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Strange behavior for argsort() and take()

2008-06-19 Thread Anne Archibald
2008/6/18 Charles R Harris [EMAIL PROTECTED]:


 On Wed, Jun 18, 2008 at 11:48 AM, Anne Archibald [EMAIL PROTECTED]
 wrote:

 2008/6/18 Stéfan van der Walt [EMAIL PROTECTED]:
  2008/6/18 Anne Archibald [EMAIL PROTECTED]:
  In [7]: x.take(x.argsort())
  Out[7]: array([ 0. ,  0.1,  0.2,  0.3])
 
  If you would like to think of it more mathematically, when you feed
  np.argsort() an array that represents a permutation of the numbers
  0,1,...,n-1, you get back the inverse permutation. When you pass a
  permutation as the argument to x.take(), you apply the permutation to
  x. (You can also invert a permutation by feeding it as indices to
  arange(n)).
 
  I have been tempted to write some support functions for manipulating
  permutations, but I'm not sure how generally useful they would be.
 
  Do we have an easy way of grabbing the elements out of an array that
  correspond to
 
  argmax(x, axis) ?
 
  `take` won't do the job; there `axis` has a different meaning.

 Well, here's a quick hack:

 def take_axis(X, ix, axis):
XX = np.rollaxis(X,axis)
s = XX.shape
return XX[(ix,)+tuple(np.indices(s[1:]))]

 And a version that works for argsort()ing:

 def take_axis_argsort(X, ix, axis):
XX = np.rollaxis(X,axis)
ixix = np.rollaxis(ix,axis)
s = XX.shape
return np.rollaxis(XX[(ixix,)+tuple(np.indices(s)[1:])],0,axis+1)

 I'm always a little bit leery of using indices(), since it can produce
 very large arrays:

 And fancy indexing too, as it tends to be slow. The axis=None case also
 needs to be handled. I think this is best done in C where the code stacks
 for argsort and arg{max,min} are good starting points. The possibility of
 different integer types for the index array adds a bit of complication.

Actually, apart from the sizes of the index arrays, I don't really
think fancy indexing can be beat. I suspect that one could write
tuple_indices that used broadcasting to get the index arrays, if you
were willing to return a tuple. That would bring these up to nearly C
speed, I think.

Of course they're not general routines, but I think with a bit of work
they might serve.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Find groups of true values or sequence of values

2008-06-19 Thread Anne Archibald
2008/6/18 bevan [EMAIL PROTECTED]:
 Hello,

 I am looking for some pointers that will hopefully get me a round an issue I
 have hit.

 I have a timeseries of river flow and would like to carry out some analysis on
 the recession periods.  That is anytime the values are decreasing.  I would
 like to restrict (mask) my data to any values that are in a continuous 
 sequence
 of 3 or more (in the example below), that are decreasing in value.

 Hopefully this example helps:

 import numpy as np

 Flow = np.array([15.4,20.5,19.4,18.7,18.6,35.5,34.8,25.1,26.7])
 FlowDiff = np.diff(Flow)
 boolFlowdiff = FlowDiff0
 MaskFlow = np.ma.masked_array(Flow[1:],boolFlowdiff)

 print MaskFlow
 [-- 19.4 18.7 18.6 -- 34.8 25.1 --]

 The output I would like is
 [-- 19.4 18.7 18.6 -- -- -- --]
 Where the second groups is blanked because the sequence only has 2 members.

I would tackle this in steps: find the decreasing pairs, then find
places where two of them occur in a row, then construct your mask.

Finding decreases:
d = diff(X)0
finding two decreases in a row:
t = d[1:]  d[:-1]
creating the right mask:
m = np.zeros(n,dtype=np.bool)
m[:n-2] |= t
m[1:n-1] |= t
m[2:n] |= t

For finding longer runs you would want other tricks.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Strange behavior for argsort() and take()

2008-06-18 Thread Anne Archibald
2008/6/18 Thomas J. Duck [EMAIL PROTECTED]:

  I have found what I think is some strange behavior for argsort
 () and take().  First, here is an example that works as expected:

   x = numpy.array([1,0,3,2])
   x.argsort()
  array([1, 0, 3, 2])

 argsort() returns the original array, which is self-indexing for
 numbers 0 through 3...

   x.take(x.argsort())
  array([0, 1, 2, 3])

 ...and take() provides the correct sort with ascending order.

 Now check out what happens when we swap the middle two numbers:

   x = numpy.array([1,3,0,2])
   x.argsort()
  array([2, 0, 3, 1])

 argsort() appears to have indexed the numbers in descending order
 (which was not expected)

   x.take(x.argsort())
  array([0, 1, 2, 3])

 ... but the take() operation puts them in ascending order anyway
 (which was really not expected)!

  Can anyone shed some light on what is happening here for me?  I
 have tested this on both my OS X/Fink and Debian/Lenny systems with
 the same result.

This is the intended behaviour. The array returned by np.argsort() is
designed to be used as an index into the original array (take() does
almost the same thing as indexing). In particular, when x.argsort()
returns array([2,0,3,1]), that means that the sorted array is x[2],
x[0], x[3], x[1]. It might be clearer to sort a different array:

In [3]: x = np.array([0.1,0.3,0.0,0.2])

In [5]: x.argsort()
Out[5]: array([2, 0, 3, 1])

In [7]: x.take(x.argsort())
Out[7]: array([ 0. ,  0.1,  0.2,  0.3])

If you would like to think of it more mathematically, when you feed
np.argsort() an array that represents a permutation of the numbers
0,1,...,n-1, you get back the inverse permutation. When you pass a
permutation as the argument to x.take(), you apply the permutation to
x. (You can also invert a permutation by feeding it as indices to
arange(n)).

I have been tempted to write some support functions for manipulating
permutations, but I'm not sure how generally useful they would be.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nose changes checked in

2008-06-18 Thread Anne Archibald
2008/6/17 Alan McIntyre [EMAIL PROTECTED]:
 On Tue, Jun 17, 2008 at 8:15 PM, Anne Archibald
 [EMAIL PROTECTED] wrote:
 Uh, I assumed NumpyTestCase was public and used it. I'm presumably not
 alone, so perhaps a deprecation warning would be good. What
 backward-compatible class should I use? unittest.TestCase?

 Yes, unittest.TestCase seemed to be completely adequate for all the
 existing tests in NumPy.  Unless you need some functionality
 explicitly implemented in NumpyTestCase (like the 'measure' or
 'rundocs' methods), you can just replace it with TestCase.   TestCase
 can be imported from numpy.testing, although (now that I think about
 it) it's probably less cryptic to just import it from unittest.

 For module-level doctests, you can place something like this in the module:

 from numpy.testing import *
 def test():
return rundocs()

 You can replace ParametricTest with generators, as described here:
 http://scipy.org/scipy/scipy/wiki/TestingGuidelines

Hmm. This won't work with the current version of numpy, will it? That
is, it needs nose. (I run much code on the work machines, which I
(thankfully) do not administer, and which are running a variety of
ancient versions of numpy (some even have only Numeric, to support
horrible quasi-in-house C extensions).) So I'd like to write my tests
in a way that they can run on both new and old versions of numpy.

 I will document all this more completely once the test setup isn't
 changing on a daily basis, honest.

 I'm assuming this experience should tell me that any item bar that I
 can get by issuing from numpy.foo import bar should be considered
 the public API and therefore deprecated instead of removed?  ;)

Well, probably. But more so for those that are used widely throughout
numpy itself, since many of us learn how to write code using numpy by
reading numpy source. (Yes, this means that internal conventions
like numpy.core.whatever get used by people who aren't writing
numpy.)

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Strange behavior for argsort() and take()

2008-06-18 Thread Anne Archibald
2008/6/18 Stéfan van der Walt [EMAIL PROTECTED]:
 2008/6/18 Anne Archibald [EMAIL PROTECTED]:
 In [7]: x.take(x.argsort())
 Out[7]: array([ 0. ,  0.1,  0.2,  0.3])

 If you would like to think of it more mathematically, when you feed
 np.argsort() an array that represents a permutation of the numbers
 0,1,...,n-1, you get back the inverse permutation. When you pass a
 permutation as the argument to x.take(), you apply the permutation to
 x. (You can also invert a permutation by feeding it as indices to
 arange(n)).

 I have been tempted to write some support functions for manipulating
 permutations, but I'm not sure how generally useful they would be.

 Do we have an easy way of grabbing the elements out of an array that
 correspond to

 argmax(x, axis) ?

 `take` won't do the job; there `axis` has a different meaning.

Well, here's a quick hack:

def take_axis(X, ix, axis):
XX = np.rollaxis(X,axis)
s = XX.shape
return XX[(ix,)+tuple(np.indices(s[1:]))]

And a version that works for argsort()ing:

def take_axis_argsort(X, ix, axis):
XX = np.rollaxis(X,axis)
ixix = np.rollaxis(ix,axis)
s = XX.shape
return np.rollaxis(XX[(ixix,)+tuple(np.indices(s)[1:])],0,axis+1)

I'm always a little bit leery of using indices(), since it can produce
very large arrays:

In [60]: np.indices((2,3,4)).shape
Out[60]: (3, 2, 3, 4)

Anne

P.S. rollaxis() saves so very much pain; I do wonder if unrollaxis
would be useful, even though it's just
def unrollaxis(X, axis):
return rollaxis(X, 0, axis+1)
-A
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nose changes checked in

2008-06-18 Thread Anne Archibald
2008/6/18 Stéfan van der Walt [EMAIL PROTECTED]:
 2008/6/18 Anne Archibald [EMAIL PROTECTED]:
 Well, probably. But more so for those that are used widely throughout
 numpy itself, since many of us learn how to write code using numpy by
 reading numpy source. (Yes, this means that internal conventions
 like numpy.core.whatever get used by people who aren't writing
 numpy.)

 People shouldn't be using code from `numpy.core` directly.  When we
 refactor code behind the scenes, we need a workspace to do it in, and
 if people start putting their hands in the engine we can't protect
 them from getting hurt.  A bigger warning sign may be appropriate (or,
 to take it to the extreme, rename `core` to `_core`).

I'm not arguing users should be using numpy.core (and I don't use it),
but given the nature of numpy and its documentation, many users look
into the code, see numpy.core and copy that. I think unless we want
to actually expose sub-namespaces - which was actively discussed - we
probably should put an underscore.

Not that we can without breaking the code of all those people who are
already using numpy.core.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nose changes checked in

2008-06-17 Thread Anne Archibald
2008/6/17 Alan McIntyre [EMAIL PROTECTED]:
 On Tue, Jun 17, 2008 at 9:26 AM, David Huard [EMAIL PROTECTED] wrote:
 I noticed that NumpyTest and NumpyTestCase disappeared, and now I am
 wondering whether these classes part of the public interface or were they
 reserved for internal usage ?

 In the former, it might be well to deprecate them before removing them.

 ParametricTestCase is gone too.  There was at least one person using
 it that said he didn't mind porting to the nose equivalent, but I
 expect that's an indication there's more people out there using them.
 If there's a consensus that they need to go back in and get marked as
 deprecated, I'll put them back.

Uh, I assumed NumpyTestCase was public and used it. I'm presumably not
alone, so perhaps a deprecation warning would be good. What
backward-compatible class should I use? unittest.TestCase?

Thanks,
Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Polyfit

2008-06-16 Thread Anne Archibald
2008/6/16 Chandler Latour [EMAIL PROTECTED]:
 I believe I'm bound to python.
 In terms of forcing the regression through the origin, the purpose is partly
 for visualization but it also should fit the data.  It would not make sense
 to model the data with an initial value other than 0.

Polyfit is just a thin wrapper around numpy.linalg.lstsq. You can do
something like:
A = np.power.outer(xs,np.arange(1,n))
outputs = np.linalg.lstsq(A,ys)

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Large symmetrical matrix

2008-06-11 Thread Anne Archibald
2008/6/10 Simon Palmer [EMAIL PROTECTED]:
 Hi I have a problem which involves the creation of a large square matrix
 which is zero across its diagonal and symmetrical about the diagonal i.e.
 m[i,j] = m[j,i] and m[i,i] = 0.  So, in fact, it is a large triangular
 matrix.  I was wondering whether there is any way of easily handling a
 matrix of this shape without either incurring a memory penalty or a whole
 whack of proprietary code?

 To get through this I have implemented a 1D array which has ((n-1)^2)/2
 elements inside a wrapper class which manpulates the arguments of array
 accessors with some arithmetic to return the approriate value.  To be honest
 I'd love to throw this away, but I haven't yet come across a feasible
 alternative.

 Any ideas?

Well, there's a risk of shooting yourself in the foot, but here's a
way to get at the elements quickly and conveniently: store the
coefficients in a 1D array (as you do already). Then if your matrix is
M by M, element (i,j) is at position (M-2)*j+i-1, assuming ij. Of
course, you can simply use this expression every time you want to look
an element up, as you do now, but that's no fun. So now create a
view A of this array having offset -1, shape (M,M), and strides
(1,(M-2)). (This kind of weird view can be created with the ndarray
constructor.) Then A[i,j] addresses the right element. A[j,i] is of
course totally wrong. But as long as you avoid ij, you can use all
numpy's magical indexing tricks. Unfortunately you can't really use
ufuncs, since they'll waste time operating on the useless lower half.

Personally, I would live with the factor-of-two memory hit.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Coverting ranks to a Gaussian

2008-06-10 Thread Anne Archibald
2008/6/9 Keith Goodman [EMAIL PROTECTED]:
 Does anyone have a function that converts ranks into a Gaussian?

 I have an array x:

 import numpy as np
 x = np.random.rand(5)

 I rank it:

 x = x.argsort().argsort()
 x_ranked = x.argsort().argsort()
 x_ranked
   array([3, 1, 4, 2, 0])

 I would like to convert the ranks to a Gaussian without using scipy.
 So instead of the equal distance between ranks in array x, I would
 like the distance been them to follow a Gaussian distribution.

 How far out in the tails of the Gaussian should 0 and N-1 (N=5 in the
 example above) be? Ideally, or arbitrarily, the areas under the
 Gaussian to the left of 0 (and the right of N-1) should be 1/N or
 1/2N. Something like that. Or a fixed value is good too.

I'm actually not clear on what you need.

If what you need is for rank i of N to be the 100*i/N th percentile in
a Gaussian distribution, then you should indeed use scipy's functions
to accomplish that; I'd use scipy.stats.norm.ppf().

Of course, if your points were drawn from a Gaussian distribution,
they wouldn't be exactly 1/N apart, there would be some distribution.
Quite what the distribution of (say) the maximum or the median of N
points drawn from a Gaussian is, I can't say, though people have
looked at it. But if you want typical values, just generate N points
from a Gaussian and sort them:

V = np.random.randn(N)
V = np.sort(V)

return V[ranks]

Of course they will be different every time, but the distribution will be right.

Anne
P.S. why the no scipy restriction? it's a bit unreasonable. -A
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Inplace shift

2008-06-07 Thread Anne Archibald
2008/6/7 Keith Goodman [EMAIL PROTECTED]:
 On Fri, Jun 6, 2008 at 10:46 PM, Anne Archibald
 [EMAIL PROTECTED] wrote:
 2008/6/6 Keith Goodman [EMAIL PROTECTED]:
 I'd like to shift the columns of a 2d array one column to the right.
 Is there a way to do that without making a copy?

 This doesn't work:

 import numpy as np
 x = np.random.rand(2,3)
 x[:,1:] = x[:,:-1]
 x

 array([[ 0.44789223,  0.44789223,  0.44789223],
   [ 0.80600897,  0.80600897,  0.80600897]])

 As a workaround you can use backwards slices:
worki
 In [40]: x = np.random.rand(2,3)

 In [41]: x[:,:0:-1] = x[:,-2::-1]

 In [42]: x
 Out[42]:
 array([[ 0.20183084,  0.20183084,  0.08156887],
   [ 0.30611585,  0.30611585,  0.79001577]])

 Neat. It makes sense to go backwards. Thank you.

 Less painful for numpy developers but more painful for users is to
 warn them about the status quo: operations on overlapping slices can
 happen in arbitrary order.

 Now I'm confused. Could some corner case of memory layout cause numpy
 to work from right to left, breaking the workaround? Or can I depend
 on the workaround working with numpy 1.0.4?

I'm afraid so. And it's not such a corner case as that: if the array
is laid out in C contiguous order, you have to go backwards, while
if the array is laid out in FORTRAN contiguous order you have to go
forwards.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array vs matrix

2008-06-07 Thread Anne Archibald
2008/6/7 Robert Kern [EMAIL PROTECTED]:
 On Sat, Jun 7, 2008 at 14:37, Ondrej Certik [EMAIL PROTECTED] wrote:
 Hi,

 what is the current plan with array and matrix with regard of calculating

 sin(A)

 ? I.e. elementwise vs sin(A) = Q*sin(D)*Q^T? Is the current approach
 (elementwise for array and Q*sin(D)*Q^T for matrix) the way to go?

 I don't believe we intend to make numpy.matrix any more featureful. I
 don't think it's a good idea for you to base sympy.Matrix's
 capabilities in lock-step with numpy.matrix, though. There are very
 different constraints at work. Please, do what you think is best for
 sympy.Matrix.

Let me elaborate somewhat:

We recently ran across some painful quirks in numpy's handling of
matrices, and they spawned massive discussion. As it stands now, there
is significant interest in reimplementing the matrix object from
scratch, with different behaviour. So emulating its current behaviour
is not a win.

For consistency, it makes a certain amount of sense to have sin(A)
compute a matrix sine, since A**n computes a matrix power. But looking
at the matrix exponential, I see that we have several implementations,
none of which is satisfactory for all matrices. I would expect the
matrix sine to be similar - particularly when faced with complex
matrices - so perhaps needing an explicit matrix sine is a good thing?

Also worth noting is that this problem can be evaded with namespaces;
matrix sin could be scipy.matrixmath.sin, abbreviated perhaps to
mm.sin, as opposed to np.sin.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A memory problem: why does mmap come up in numpy.inner?

2008-06-04 Thread Anne Archibald
2008/6/4 Dan Yamins [EMAIL PROTECTED]:

 So, I have three questions about this:
 1) Why is mmap being called in the first place?  I've written to Travis
 Oliphant, and he's explained that numpy.inner does NOT directly do any
 memory
 mapping and shouldn't call mmap.  Instead, it should just operate with
 things in
 memory -- in which case my 8 GB should allow the computation to go through
 just
 fine.  What's going on?

 2) How can I stop this from happening?  I want to be able to leverage
 large
 amounts of ram on my machine to scale up my computations and not be
 dependent on
 the limitations of the address space size.  If the mmap is somehow being
 called
 by the OS, is there some option I can set that will make it do things in
 regular
 memory instead?  (Sorry if this is a stupid question.)

I don't know much about OSX, but I do know that many malloc()
implementations take advantage of a modern operating system's virtual
memory when allocating large blocks of memory. For small blocks,
malloc uses memory arenas, but if you ask for a large block malloc()
will request a whole bunch of pages from the operating system. This
way when the memory is freed, free() can easily return the chunk of
memory to the OS. On some systems, one way to get such a big hunk of
memory from the system is with an anonymous mmap(). I think that's
what's going on here. So I don't think you want to stop malloc() from
using mmap().

You do, of course, want the memory allocation to succeed, and I'm
afraid I don't have any idea why it can't. Under Linux, I know that
you can run a 64-bit processor in 32-bit mode, which gives you the
usual 4 GB address space limit. I've no idea what OSX does.

Good luck,
Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A memory problem: why does mmap come up in numpy.inner?

2008-06-04 Thread Anne Archibald
2008/6/4 Dan Yamins [EMAIL PROTECTED]:

 Anne, thanks so much for your help.  I still a little confused.   If your
 scenario about the the memory allocation is working is right, does that mean
 that even if I put a lot of ram on the machine, e.g.  16GB, I still can't
 request it in blocks larger than the limit imposed by the processor
 architecture (e.g. 4 GB for 32, 8 GB for 64-bit)?What I really want is
 to be able to have my ability to request memory limited just by the amount
 of memory on the machine, and not have it depend on something about
 paging/memory mapping limits.  Is this a stupid/naive thing?

No, that's a perfectly reasonable thing to want, and it *should* be
possible with your hardware. Maybe I'm barking up the wrong tree and
the problem is something else. But if you'll bear with me:

For the last ten years or so, normal computers have used 32 bits to
address memory. This means  that any given process can only get at
2**32 different bytes of memory, which is 4 GB. The magic of virtual
memory means you can possibly have a number of different processes
addressing different 4 GB chunks, some of which may temporarily be
stored on disk at any given time. If you want to get at a chunk of
memory bigger than 4 GB, you need all of these things:

* Your CPU must be 64-bit, or else it has no hope of being able to
access more than 4 GB (in fact more than 3 GB for complicated reasons)
of physical memory.
* Your operating system must be 64-bit (64-bit PC CPUs have a 32-bit
compatibility mode they often find themselves running in) so that it
can know about all the real RAM you have installed and so that it can
support 64-bit programs.
* Your program must be 64-bit, so that it can access huge chunks of RAM.

For compatibility reasons, many 64-bit machines have most of their
software compiled in 32-bit mode. This isn't necessarily a problem -
the programs probably even run faster - but they can't access more
than 4 GB each. So it's worth finding out whether your python
interpreter is compiled as a 64-bit program (and whether that's even
possible under your version of OSX). Unfortunately I  don't know how
to find that out in your case.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A memory problem: why does mmap come up in numpy.inner?

2008-06-04 Thread Anne Archibald
2008/6/4 Dan Yamins [EMAIL PROTECTED]:




 Try

 In [3]: numpy.dtype(numpy.uintp).itemsize
 Out[3]: 4

 which is the size in bytes of the integer needed to hold a pointer. The
 output above is for 32 bit python/numpy.

 Chuck

 Check, the answer is 4, as you got for the 32-bit.   What would the answer
 be on a 64-bit architecture?  Why is this diagnostic?

In a 64-bit setting, a pointer needs to be 64 bits long, that is,
eight bytes, not four.

What Charles pointed out was that while the inner product is very big,
it seems to fit into memory on his 32-bit Linux machine; is it
possible that OSX is preventing your python process from using even
the meager 2-3 GB that a 32-bit process ought to get? In particular,
try running Charles' script in a fresh python interpreter and see if
it works; it may be that other arrays you had allocated are taking up
some of the space that this one could.

You will probably still want a 64-bit python, though, in order to have
a little elbow room.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] linked variables?

2008-06-03 Thread Anne Archibald
2008/6/3 Robert Kern [EMAIL PROTECTED]:

 Python does not copy data when you assign something to a new variable.
 Python simply points the new name to the same object. If you modify
 the object using the new name, all of the other names pointing to that
 object will see the changes. If you want a copy, you will need to
 explicitly make one. For numpy arrays, the best way to do this is to
 use array().

  m_i = array(m_o)

Is array() really the best way to copy an array? I would have thought
m_i = m_o.copy() would be better - requiring less of array's clever
guessing, and preserving subtypes. On the downside it doesn't work if
you're given a list, but perhaps np.copy() would be better (though it
loses subtype information)? Certainly it's clearer.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] How does Boolean indexing work?

2008-06-03 Thread Anne Archibald
Hi,

I thought it might be useful to summarize the different ways to use
numpy's indexing, slice and fancy. The document so far is here:

http://www.scipy.org/Cookbook/Indexing

While writing it I ran into some puzzling issues. The first of them
is, how is Boolean indexing supposed to work when given more than one
array? What I mean is, suppose we have the following array:

In [3]: A = np.arange(9)

In [4]: B = np.reshape(A,(3,3))

In [5]: B
Out[5]:
array([[0, 1, 2],
   [3, 4, 5],
   [6, 7, 8]])

Then we can use boolean indexing to select some rows or some columns:

In [12]: B[np.array([True,False,True]),:]
Out[12]:
array([[0, 1, 2],
   [6, 7, 8]])

In [13]: B[:,np.array([True,False,True])]
Out[13]:
array([[0, 2],
   [3, 5],
   [6, 8]])

But if you supply Boolean arrays in both row and column places, rather
than select the corresponding rows and columns, you get something
basically mysterious:
In [14]: B[np.array([True,False,True]),np.array([True,True,False])]
Out[14]: array([0, 7])

In [15]: B[np.array([True,False,True]),np.array([True,False,False])]
Out[15]: array([0, 6])

In [16]: B[np.array([True,False,True]),np.array([True,True,True])]
---
type 'exceptions.ValueError'Traceback (most recent call last)

/home/peridot/ipython console in module()

type 'exceptions.ValueError': shape mismatch: objects cannot be
broadcast to a single shape

What is going on here?

Thanks,
Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Slice assignment and overlapping views (was: Strange behavior in setting masked array values in Numpy 1.1.0)

2008-06-01 Thread Anne Archibald
2008/5/31 Pauli Virtanen [EMAIL PROTECTED]:

 The reason for the strange behavior of slice assignment is that when the
 left and right sides in a slice assignment are overlapping views of the
 same array, the result is currently effectively undefined. Same is true
 for ndarrays:

 import numpy
 a = numpy.array([1, 2, 3, 4, 5])
 a[::-1]
 array([5, 4, 3, 2, 1])
 a[:] = a[::-1]
 a
 array([5, 4, 3, 4, 5])

I think that the current rule is, slices are walked from low index to
high index. This doesn't help with multidimensional arrays, where the
order of the axes is (and should be) determined by efficiency
considerations.

Unfortunately there's really no good way to warn about overlapping
copies. Remember that this is a frequent operation, so it has to be
fast for small arrays.

I think changing base so that it points to the real base and not the
parent would help (and clear up a memory leak: try while True: A =
A[::-1] some time) eliminate some cases where overlap cannot occur,
but what about the following cases?

A[:5] = A[-5:]
A[::2] = A[1::2]
A[1:] = A[-1:]

The last is actually fairly common (I've needed it), and relies on
numpy's ordering of copies. The middle one is very common, and the
first one would be a royal pain to code around if the slices were not
allowed to overlap.

I think I at one point wrote an algorithm that would detect many cases
where overlap could not occur (maybe in the smarter reshape code?) but
I came to the conclusion that detecting all the cases was infeasible.
It's a number-theoretic problem - can you write the same number as
the sum of multiples of these two lists of numbers with nonnegative
integer coefficients less than these other two lists of numbers? -
and I suspect it's NP-complete. (Ah, yes, you can make a knapsack out
of it - take an array with strides a_1, ... a_n and shape (2,...,2),
and a single-element array starting at position N.) In any case it's
infeasible to solve it every time somebody tries to do a slice
assignment.

In any case, many users need nearly-overlapping slices, and some need
really-overlapping slices. Preventing problems is going to have to
happen at a higher level.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] New documentation web application

2008-05-29 Thread Anne Archibald
2008/5/29 Jarrod Millman [EMAIL PROTECTED]:
 On Thu, May 29, 2008 at 3:28 PM, Stéfan van der Walt [EMAIL PROTECTED] 
 wrote:
 The NumPy documentation project has taken another leap forward!  Pauli
 Virtanen has, in a week of superhuman coding, produced a web
 application that enhances the work-flow and editing experience of
 NumPy docstrings on the web.

 Excellent.  Thanks to everyone who is working on this.  The
 documentation work that you are doing is going to make a huge
 difference in increasing NumPy adoption and will benefit the community
 in a major way.

Absolutely. It is also, already, improving the reliability and
consistency of numpy's code as people carefully go over all the
functions that have been neglected for years. This is a great project.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposal on ufunc shift operators.

2008-05-28 Thread Anne Archibald
2008/5/28 Charles R Harris [EMAIL PROTECTED]:
 Hi All,

 Currently we have:

 In [2]: ones(1,dtype=int8)  ones(1,dtype=uint8)
 Out[2]: array([2], dtype=int16)

 In [4]: ones(1,dtype=int64)  ones(1,dtype=uint64)
 Out[4]: array([2], dtype=object)

 Note the increased size in the first case and the return of a Python long
 integer object in the second. I propose that these operators should preserve
 the type of the first argument, although this is not easy to do with the
 current ufunc setup. It is impossible to use a type of sufficient size for
 all shift values and preserving the type of the first argument is what I
 think most folks would expect.

That sounds eminently sensible to me. Of course this will complicate
explaining type rules for ufuncs. An alternative is to give it some
reasonably simple non-exceptional rule and let users specify the
output dtype if they care. Still, I think in the interest of minimal
surprise, getting rid of the conversion to an object array of longs,
at least, would be a good idea.

Do we have a cyclic-shift ufunc? That is, uint8(1)8==1?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] List of function-like things with an 'out' parameter

2008-05-28 Thread Anne Archibald
2008/5/28 Alan McIntyre [EMAIL PROTECTED]:
 On Wed, May 28, 2008 at 3:34 PM, Charles R Harris  I wonder if this
 is something that ought to be looked at for all
 functions with an out parameter?  ndarray.compress also had problems
 with array type mismatch (#789); I can't imagine that it's safe to
 assume only these two functions were doing it incorrectly. (Unless of
 course somebody has recently looked at all of them)

 I think that is an excellent idea! A good start would be to list all the
 functions with the out parameter and then write some tests. The current
 behavior is inconsistent and we not only need to specify the behavior, but
 fix all the places that don't follow the rules.

 Here's a list of things in numpy that have an 'out' argument (and
 their arguments); I think I eliminated all the duplicated items (that
 are imported from a subpackage into one of the main packages, for
 example).  There's stuff that's missing, probably; I think the
 C-implemented functions don't have argument lists programmatically
 available so I may parse their docstrings or something, but this is a
 start.

Nice!

One noticeable absence is all the ufuncs. (Partly this is because it's
not actually called out, or on fact anything at all; it's just the
last parameter if there are enough.) You might also check things like
objects returned by vectorize() and frompyfunc().

Does it make sense to put this list on the Wiki somewhere, so that
people who come across new things that take output parameters (however
named) can post them?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Is this a bug?

2008-05-28 Thread Anne Archibald
2008/5/27 Robert Kern [EMAIL PROTECTED]:

 Can we make it so that dtype('c') is preserved instead of displaying
 '|S1'? It does not behave the same as dtype('|S1') although it
 compares equal to it.

It seems alarming to me that they should compare equal but behave
differently. Is it possible to change more than just the way it
prints?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Buildbot errors.

2008-05-24 Thread Anne Archibald
2008/5/24 Jarrod Millman [EMAIL PROTECTED]:
 On Thu, May 22, 2008 at 11:25 PM, Charles R Harris
 [EMAIL PROTECTED] wrote:
 On Fri, May 23, 2008 at 12:10 AM, Charles R Harris
 [EMAIL PROTECTED] wrote:

 The python 2.6 buildbots are showing 5 failures that are being hidden by
 valgrind.

 They seem to have fixed themselves,  they were probably related to the API
 addition I made, then moved. However, it is a bad thing that the errors were
 covered up by Valgrind and not reported.

 I didn't understand what you meant by this yesterday, but I see what
 you are talking about now:
 http://buildbot.scipy.org/builders/Linux_x86_64_Fedora/builds/486/steps/shell_2/logs/stdio
 http://buildbot.scipy.org/builders/Linux_x86_Fedora_Py2.6/builds/461/steps/shell_2/logs/stdio

 You said they fixed themselves, but the failures are in the most
 recent buildbot reports.  This is the only thing I am concerned about
 before branching, so hopefully someone can look at this and let me
 know whether the failures are indeed fixed.

They do not appear on my machine (a pentium-M running Ubuntu). I
should point out that they are only actually three distinct errors,
because one of the test suites is being run twice.They are not exactly
subtle tests; they're checking that seterr induces the raising of
exceptions from np.arange(3)/0, np.sqrt(-np.arange(3)), and
np.array([1.])/np.array([0.]).

The tests pass on the x86_64 machine I have access to (a
multiprocessor Opteron running Knoppix of all things). That is, it
only has python 2.4 (don't ask) so the tests can't be run, but running
the same tests by hand produces the expected results. This particular
feature - seterr - is the sort of thing an overaggressive optimizer
can easily butcher, though, so it could easily be the result of the
particular configuration on the buildbot machine.

I think somebody with access to the buildbot machine needs to see
what's going on. In particular: does a manually-compiled numpy exhibit
the problem? How clean does the buildbot make its environment? Do the
functions behave correctly from an interactive session? Do other
seterr conditions have the same problem?

Anne
P.S. Please ignore the alarming-looking buildbot failure; it is due to
operator headspace. -A
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Buildbot errors.

2008-05-24 Thread Anne Archibald
2008/5/24 Charles R Harris [EMAIL PROTECTED]:

 I take that back,  I got confused looking through the output. The errors are
 the same and only seem to happen when valgrind runs the tests.

Sounds like maybe valgrind is not IEEE clean:


As of version 3.0.0, Valgrind has the following limitations in its
implementation of x86/AMD64 floating point relative to IEEE754.

+

Numeric exceptions in FP code: IEEE754 defines five types of numeric
exception that can happen: invalid operation (sqrt of negative number,
etc), division by zero, overflow, underflow, inexact (loss of
precision).

For each exception, two courses of action are defined by IEEE754:
either (1) a user-defined exception handler may be called, or (2) a
default action is defined, which fixes things up and allows the
computation to proceed without throwing an exception.

Currently Valgrind only supports the default fixup actions. Again,
feedback on the importance of exception support would be appreciated.

When Valgrind detects that the program is trying to exceed any of
these limitations (setting exception handlers, rounding mode, or
precision control), it can print a message giving a traceback of where
this has happened, and continue execution. This behaviour used to be
the default, but the messages are annoying and so showing them is now
disabled by default. Use --show-emwarns=yes to see them.


Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 1.1.x branch and 1.1.0 tag imminent

2008-05-24 Thread Anne Archibald
2008/5/24 Jarrod Millman [EMAIL PROTECTED]:
 On Fri, May 23, 2008 at 11:16 PM, David Cournapeau
 [EMAIL PROTECTED] wrote:
 Where do those errors appear ? I don't see them on the builbot. Are they
 2.6 specific ? If yes, I would say ignore them, because 2.6 is not
 released yet, and is scheduled for september.

 You can see them here:
 http://buildbot.scipy.org/builders/Linux_x86_64_Fedora/builds/486/steps/shell_2/logs/stdio
 http://buildbot.scipy.org/builders/Linux_x86_Fedora_Py2.6/builds/461/steps/shell_2/logs/stdio

 You have to search for them, because as Chuck pointed out it seems
 valgrind seems to be hiding them.  We should figure out how to avoid
 this at some point down the road as well.

 I am not sure whether or not this is 2.6 specific or not.

No. These are a known limitation of valgrind when dealing with
floating-point. Other than emailing the valgrind developers to tell
them that yes, somebody cares about this, I think they can be safely
ignored (thank goodness).

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] first recarray steps

2008-05-21 Thread Anne Archibald
2008/5/21 Vincent Schut [EMAIL PROTECTED]:
 Christopher Barker wrote:

 Also, if you image data is rgb, usually, that's a (width, height, 3)
 array: rgbrgbrgbrgb... in memory. If you have a (3, width, height)
 array, then that's rrr... Some image libs
 may give you that, I'm not sure.

 My data is. In fact, this is a simplification of my situation; I'm
 processing satellite data, which usually has more (and other) bands than
 just rgb. But the data is definitely in shape (bands, y, x).

You may find your life becomes easier if you transpose the data in
memory. This can make a big difference to efficiency. Years ago I was
working with enormous (by the standards of the day) MATLAB files on
disk, storing complex data. The way (that version of) MATLAB
represented complex data was the way you describe: matrix of real
parts, matrix of imaginary parts. This meant that to draw a single
pixel, the disk needed to seek twice... depending on what sort of
operations you're doing, transposing your data so that each pixel is
all in one place may improve cache coherency as well as making the use
of record arrays possible.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy/SciPy Documentation Marathon 2008

2008-05-21 Thread Anne Archibald
2008/5/21 LB [EMAIL PROTECTED]:
 This is really a great news, and it seems very promising according to
 the first pages of the Wiki that I've seen.

 It's perhaps not the right place to say this, but I was wondering what
 you would thinking about adding labels or category to the descriptions
 of each function ? I think It would really help newcomers to search
 around the 400 functions of numpy if they could use some labels to
 precise theirs thoughts.

You're right that some kind of categorization would be very valuable.
I think that this is planned, but the doc marathon organizers can
presumably give a more detailed answer.

 Until now, the most efficient why of finding the numpy  function that
 fits my needds what to search after some words in the numpy example
 list page. This was not always very fast and labels like array
 creation, shape manipulation,index operation,
 arithmetic, ...etc, could simplify this kind of search.

In the short term, there's the numpy functions by category page on the wiki:
http://www.scipy.org/Numpy_Functions_by_Category
Of course, this page is already incomplete, and nobody is
systematically updating it. Really the right solution is what you
propose, annotating each function and then automatically generating
such a list.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] question on NumPy NaN

2008-05-20 Thread Anne Archibald
2008/5/20 Vasileios Gkinis [EMAIL PROTECTED]:

 I have a question concerning nan in NumPy.
 Lets say i have an array of sample measurements
 a = array((2,4,nan))
 in NumPy calculating the mean of the elements in array a looks like:

 a = array((2,4,nan))
 a
 array([  2.,   4.,  NaN])
 mean(a)
 nan

 What if i simply dont want nan to propagate and get something that would
 look like:

 a = array((2,4,nan))
 a
 array([  2.,   4.,  NaN])
 mean(a)
 3.

For more elaborate handling of missing data, look into masked
arrays, in numpy.ma. They are designed to deal with exactly this sort
of thing.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] dimension aligment

2008-05-20 Thread Anne Archibald
2008/5/20 Thomas Hrabe [EMAIL PROTECTED]:

 given a 3d array
 a =
 numpy.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]],[[13,14,15],[16,17,18]],[[19,20,21],[22,23,24]]])
 a.shape
 returns (4,2,3)

 so I assume the first digit is the 3rd dimension, second is 2nd dim and
 third is the first.

 how is the data aligned in memory now?
 according to the strides it should be
 1,2,3,4,5,6,7,8,9,10,...
 right?

 if I had an array of more dimensions, the first digit returned by shape
 should always be the highest dim.

You are basically right, but this is a surprisingly subtle issue for numpy.

A numpy array is basically a block of memory and some description. One
piece of that description is the type of data it contains (i.e., how
to interpret each chunk of memory) for example int32, float64, etc.
Another is the sizes of all the various dimensions. A third piece,
which makes many of the things numpy does possible, is the strides.
The way numpy works is that basically it translates

A[i,j,k]

into a lookup of the item in the memory block at position

i*strides[0]+j*strides[1]+k*strides[2]

This means, if you have an array A and you want every second element
(A[::2]), all numpy needs to do is hand you back a new array pointing
to the same data block, but with strides[0] doubled. Similarly if you
want to transpose a two-dimensional array, all it needs to do is
exchange strides[0] and strides[1]; no data need be moved.

This means, though, that if you are handed a numpy array, the elements
can be arranged in memory in quite a complicated fashion. Sometimes
this is no problem - you can always use the strides to find it all.
But sometimes you need the data arranged in a particular way. numpy
defines two particular ways: C contiguous and FORTRAN contiguous.

C contiguous arrays are what you describe, and they're what numpy
produces by default; they are arranged so that the rightmost index has
the smallest stride. FORTRAN contiguous arrays are arranged the
other way around; the leftmost index has the smallest stride. (This is
how FORTRAN arrays are arranged in memory.)

There is also a special case: the reshape() function changes the shape
of the array. It has an order argument that describes not how the
elements are arranged in memory but how you want to think of the
elements as arranged in memory for the reshape operation.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Slicing a numpy array and getting the complement

2008-05-19 Thread Anne Archibald
2008/5/19 Orest Kozyar [EMAIL PROTECTED]:
 Given a slice, such as s_[..., :-2:], is it possible to take the
 complement of this slice?  Specifically, s_[..., ::-2].  I have a
 series of 2D arrays that I need to split into two subarrays via
 slicing where the members of the second array are all the members
 leftover from the slice.  The problem is that the slice itself will
 vary, and could be anything such as s_[..., 1:4:] or s_[..., 1:-4:],
 etc, so I'm wondering if there's a straightforward idiom or routine in
 Numpy that would facilitate taking the complement of a slice?  I've
 looked around the docs, and have not had much luck.

If you are using boolean indexing, of course complements are easy
(just use ~). But if you want slice indexing, so that you get views,
sometimes the complement cannot be expressed as a slice: for example:

A = np.arange(10)
A[2:4]

The complement of A[2:4] is np.concatenate((A[:2],A[4:])). Things
become even more complicated if you start skipping elements.

If you don't mind fancy indexing, you can convert your index arrays
into boolean form:
complement = A==A
complement[idx] = False

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Slicing a numpy array and getting the complement

2008-05-19 Thread Anne Archibald
2008/5/19 Orest Kozyar [EMAIL PROTECTED]:
 If you don't mind fancy indexing, you can convert your index arrays
 into boolean form:
 complement = A==A
 complement[idx] = False

 This actually would work perfectly for my purposes.  I don't really
 need super-fancy indexing.

Heh. Actually fancy indexing is numpy-speak for indexing with
anything that's not an integer or a slice. In this case, indexing with
a boolean array is fancy indexing. The reason we make this
distinction is that with slices, the new array you get is actually
just a reference to the original array (so you can modify the original
array through it). With fancy indexing, the new array you get is
actually a copy. (Assigning to fancy-indexed arrays is handled
specially in __setitem__, so it works.)

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quick Question about Optimization

2008-05-19 Thread Anne Archibald
2008/5/19 James Snyder [EMAIL PROTECTED]:

 First off, I know that optimization is evil, and I should make sure
 that everything works as expected prior to bothering with squeezing
 out extra performance, but the situation is that this particular block
 of code works, but it is about half as fast with numpy as in matlab,
 and I'm wondering if there's a better approach than what I'm doing.

 I have a chunk of code, below, that generally iterates over 2000
 iterations, and the vectors that are being worked on at a given step
 generally have ~14000 elements in them.

With arrays this size, I wouldn't worry about python overhead - things
like range versus xrange or self lookups.

 Is there anything in practice here that could be done to speed this
 up?  I'm looking more for general numpy usage tips, that I can use
 while writing further code and not so things that would be obscure or
 difficult to maintain in the future.

Try using a profiler to find which steps are using most of your time.
With such a simple function it may not be very informative, but it's
worth a try.

 Also, the results of this are a binary array, I'm wondering if there's
 anything more compact for expressing than using 8 bits to represent
 each single bit.  I've poked around, but I haven't come up with any
 clean and unhackish ideas :-)

There's a tradeoff between compactness and speed here. The *fastest*
is probably one boolean per 32-bit integer. It sounds awful, I know,
but most modern CPUs have to work harder to access bytes individually
than they do to access them four at a time. On the other hand, cache
performance can make a huge difference, so compactness might actually
amount to speed. I don't think numpy has a packed bit array data type
(which is a shame, but would require substantial implementation
effort).

 I can provide the rest of the code if needed, but it's basically just
 filling some vectors with random and empty data and initializing a few
 things.

It would kind of help, since it would make it clearer what's a scalar
and what's an array, and what the dimensions of the various arrays
are.

for n in range(0,time_milliseconds):
self.u  =  self.expfac_m  *  self.prev_u +
 (1-self.expfac_m) * self.aff_input[n,:]
self.v = self.u + self.sigma *
 np.random.standard_normal(size=(1,self.naff))

You can use scale to rescale the random numbers on creation; that'll
save you a temporary.

self.theta = self.expfac_theta * self.prev_theta -
 (1-self.expfac_theta)

idx_spk = np.where(self.v=self.theta)

You can probably skip the where; the result of the expression
self.v=self.theta is a boolean array, which you can use directly for
indexing.

self.S[n,idx_spk] = 1
self.theta[idx_spk] = self.theta[idx_spk] + self.b

+= here might speed things up, not just in terms of temporaries but by
saving a fancy-indexing operation.

self.prev_u = self.u
self.prev_theta = self.theta



Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpify this?

2008-05-18 Thread Anne Archibald
2008/5/18 Matt Crane [EMAIL PROTECTED]:
 On Sun, May 18, 2008 at 8:52 PM, Robert Kern [EMAIL PROTECTED] wrote:
 Are there repeats?
 No, no repeats in the first column.

 I'm going to go get a cup of coffee before I forget to leave out any
 potentially vital information again. It's going to be a long day.

It can be done, though I had to be kind of devious. My solution might
not even be O(n log n), depending on how mergesort is implemented:

def find_matches(A,B):
Find positions A_idx and B_idx so that A[A_idx]==B[B_idx]

A and B should be sorted arrays with no repeats; A_idx and
B_idx will be all the places they agree.

 import numpy as np
 A = np.array([1,2,4,5,7])
 B = np.array([1,3,5,9])
 find_matches(A,B)
array([0,3]), array([0,2])

AB = np.concatenate((A,B))
idx = np.argsort(np.concatenate((A,B)),kind=mergesort)
sorted = AB[idx]
pairs = np.where(np.diff(sorted)==0)[0]
A_pairs = idx[pairs]
B_pairs = idx[pairs+1]-len(A)

if np.any(A_pairs=len(A)):
raise ValueError, first array contains a repeated element
if np.any(B_pairs0):
raise ValueError, second array contains a repeated element

return A_pairs, B_pairs

The idea is that diff is not a bad way to find repeats in a sequence;
so concatenate the two sequences and sort them (if you're lucky
mergesort will be fast because they're individually sorted, but in any
case we need stability). Of course, you have to take the pairs in the
sorted sequence and figure out where they came from, but fortunately
you don't even need to invert the permutation returned by argsort (do
we have a tool to invert permutations, or should one just use argsort
again?).

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] checking element types in array

2008-05-17 Thread Anne Archibald
2008/5/17 Zoho Vignochi [EMAIL PROTECTED]:
 hello:

 I am writing my own version of a dot product. Simple enough this way:

 def dot_r(a, b):
return sum( x*y for (x,y) in izip(a, b) )

 However if both a and b are complex we need:

 def dot_c(a, b):
return sum( x*y for (x,y) in izip(a.conjugate(), b) ).real

As you probably realize, this will be vastly slower (ten to a hundred
times) than the built-in function dot() in numpy.

 I would like to combine these so that I need only one function which
 detects which formula based on argument types. So I thought that
 something like:

 def dot(a,b):
if isinstance(a.any(), complex) and isinstance(b.any(), complex):
return sum( x*y for (x,y) in izip(a.conjugate(), b) ).real
else:
return sum( x*y for (x,y) in izip(a, b) )

 And it doesn't work because I obviously have the syntax for checking
 element types incorrect. So my real question is: What is the best way to
 check if any of the elements in an array are complex?

numpy arrays are efficient, among other reasons, because they have
homogeneous types. So all the elements in an array are the same type.
(Yes, this means if you have an array of numbers only one of which
happens to be complex, you have to represent them all as complex
numbers whose imaginary part happens to be zero.) So if A is an array
A.dtype is the type of its elements.

numpy provides two convenience functions for checking whether an array
is complex, depending on what you want:

iscomplex checks whether each element has a nonzero imaginary part and
returns an array representing the element-by-element answer; so
any(iscomplex(A)) will be true if any element of A has a nonzero
imaginary part.

iscomplexobj checks whether the array has a complex data type. This is
much much faster, but of course it may happen that all the imaginary
parts happen to be zero; if you want to treat this array as real, you
must use iscomplex.

Anne


Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] question about optimizing

2008-05-17 Thread Anne Archibald
2008/5/17 Charles R Harris [EMAIL PROTECTED]:


 On Sat, May 17, 2008 at 9:52 AM, Alan G Isaac [EMAIL PROTECTED] wrote:

 On Fri, 16 May 2008, Anne Archibald apparently wrote:
  storing actual python objects in an array is probably not
  a good idea

 I have been wondering what people use object arrays for.
 I have been guessing that it is for indexing convenience?
 Are there other core motivations?

 You can always define an object array of matrices, which solves Tim's
 problem of matrix stacks, albeit in not the most efficient manner and not
 the easiest thing to specify due to the current limitations of array(...). I
 do think it would be nice to have arrays or arrays, but this needs one more
 array type so that one can do something like array(list_of_arrays,
 dtype=matrix((2,3),float)), i.e., we could use a fancier dtype.

I think if you're going to be writing code to do this, it would be
better not to use object arrays. After all, there no reason the
underlysing storage for an array of matrices shouldn't be one big
block of memory rather than a lot of scattered python objects.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] question about optimizing

2008-05-17 Thread Anne Archibald
2008/5/17 Brian Blais [EMAIL PROTECTED]:

 at least for me, that was the motivation.  I am trying to build a simulation
 framework for part of the brain, which requires connected layers of nodes.
  A layer is either a 1D or 2D structure of nodes, with each node a
 relatively complex beast.  Rather than reinvent the indexing (1D, 2D,
 slicing, etc...), I just inherited from ndarray.  I thought, after the fact,
 that some numpy functions on arrays would help speed up the code, which
 consists mostly of calling an update function on all nodes, passing each
 them an input vector.  I wasn't sure if there would be any speed up for
 this, compared to
 for n in self.flat:
n.update(input_vector)
 From the response, the answer seems to be no, and that I should stick with
 the python loops for clarity.  But also, the words of Anne Archibald, makes
 me think that I have made a bad choice by inheriting from ndarray, although
 I am not sure what a convenient alternative would be.

Well, it doesn't exist yet, but a handy tool would be a factory
function ArrayOf; you would pass it a class, and it would produce a
subclass of ndarray designed to contain that class. That is, the
underlying storage would be a record array, but the getitem and
setitem would automatically handle conversion to and from the class
you supplied it, where appropriate.

myarray = ArrayOf(Node,dtype=...)
A = myarray.array([Node(...), Node(...), Node(...)])
n = A[1]
A[2] = Node(...)
A.C.update() # python-loop-based update of all elements

You could also design it so that it was easy to derive a class from
it, since that's probably the best way to handle vectorized methods:

class myarray(ArrayOf(Node, dtype=...)):
def update(self):
self.underlying[node_attribute] += 1


I should say, if you can get away with treating your nodes more like C
structures and writing (possibly vectorized) functions to act on them,
you can avoid all this mumbo jumbo:

node_dtype = [(node_attribute,np.int),(weight, np.float)]
A = np.zeros(10,dtype=node_dtype)

def nodes_update(A):
A[node_attribute] += 1

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Ruby benchmark -- numpy is slower.... was: Re: Ruby's NMatrix and NVector

2008-05-16 Thread Anne Archibald
2008/5/16 David Cournapeau [EMAIL PROTECTED]:
 Sebastian Haase wrote:
 Hi,
 can someone comment on these timing numbers ?
 http://narray.rubyforge.org/bench.html.en

 Is the current numpy faster ?


 It is hard to know without getting the same machine or having the
 benchmark sources. But except for add, all other operations rely on
 underlying blas/lapack (only matrix operations do if you have no cblas),
 so I am a bit surprised by the results.

 FWIW, doing 100 x c = a + b with 1e6 elements on a PIV prescott @ 3.2
 Ghz is about 2 sec, and I count numpy start:

 import numpy as np

 a = np.random.randn(1e6)
 b = np.random.randn(1e6)

 for i in range(100):
a + b

 And np.dot(a, b) for 3 iterations and 500x500 takes 0.5 seconds (again
 taking into account numpy import), but what you really do here is
 benchmarking your underlying BLAS (if numpy.dot does use BLAS, again,
 which it does at least when built with ATLAS).

There are four benchmarks: add, multiply, dot, and solve. dot and
solve use BLAS, and for them numpy ruby and octave are comparable. Add
and multiply are much slower in numpy, but they are implemented in
numpy itself.

Exactly why add and multiply are slower is an interesting question -
loop overhead? striding? cache behaviour?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.arccos(numpy.inf)????

2008-05-16 Thread Anne Archibald
2008/5/16 Stuart Brorson [EMAIL PROTECTED]:
 Hi --

 Sorry to be a pest with corner cases, but I found another one.

 In this case, if you try to take the arccos of numpy.inf in the
 context of a complex array, you get a bogus return (IMO).  Like this:

 In [147]: R = numpy.array([1, numpy.inf])

 In [148]: numpy.arccos(R)
 Warning: invalid value encountered in arccos
 Out[148]: array([  0.,  NaN])

 In [149]: C = numpy.array([1+1j, numpy.inf])

 In [150]: numpy.arccos(C)
 Warning: invalid value encountered in arccos
 Out[150]: array([ 0.90455689-1.06127506j, NaN   -Infj])

 The arccos(numpy.inf) in the context of a real array is OK, but taking
 arcocs(numpy.inf) in the context of a complex array should return
 NaN + NaNj, IMO.

 Thoughts?

Is this so bad?

lim x-inf arccos(x) = -j inf

Granted this is only when x approaches infinity along the positive real axis...

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] question about optimizing

2008-05-16 Thread Anne Archibald
2008/5/16 Brian Blais [EMAIL PROTECTED]:

 I have a custom array, which contains custom objects (I give a stripped down
 example below), and I want to loop over all of the elements of the array and
 call a method of the object.  I can do it like:
 a=MyArray((5,5),MyObject,10)

 for obj in a.flat:
 obj.update()
 but I was wondering if there is a faster way, especially if obj.update is a
 cython-ed function.  I was thinking something like apply_along_axis, but
 without having an input array at all.
 Is there a better way to do this?  While I'm asking, is there a better way
 to overload ndarray than what I am doing below?  I tried to follow code I
 found online, but the examples of this are few and far between.

Unfortunately, the loop overhead isn't very big compared to the
overhead of method dispatch, so there's no way you're going to make
things fast. For convenience you can do something like

update = vectorize(lambda object: object.update())

and then later

update(a)

But if you really want things to go quickly, I think the best approach
is to take advantage of the main feature of arrays: they hold
homogeneous items efficiently. So use the array to store the contents
of your object, and put any special behaviour in the array object. For
example, if I wanted to efficiently compute with arrays of numbers
carrying units, I would attach a unit to the array as a whole, and
have the array store doubles. With some cleverness, you can even have
accesses to the array return a freshly-created object whose contents
are based on the values you looked up in the array. But storing actual
python objects in an array is probably not a good idea.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] let's use patch review

2008-05-15 Thread Anne Archibald
2008/5/15 Francesc Alted [EMAIL PROTECTED]:

 I don't need to say that this procedure was not used for small or
 trivial changes (that were fixed directly), but only when the issue was
 important enough to deserve the attention of the mate.

I think here's the rub: when I hear patch review system it sounds to
me like an obstacle course for getting code into the software. Maybe
it's justified, but I think at the moment there are many many things
that are just awaiting a little bit of attention from someone who
knows the code. A patch review system overapplied would multiply that
number by rather a lot.

How about a purely-optional patch review system? I've submitted
patches I wanted reviewed before they went in the trunk. As it was, I
didn't have SVN access, so I just posted them to trac or emailed them
to somebody, who then pondered and committed them. But a patch review
system - provided people were promptly reviewing patches - would have
fit the bill nicely.

How frequently does numpy receive patches that warrant review? The
zillion little doc fixes don't, even moderate-sized patches from
experienced developers probably don't warrant review. So in the last
while, what are changes that needed review, and what happened to them?

* financial code - email to the list, discussion, eventual inclusion
* matrix change - bug filed, substantial discussion, confusion about
consensus committed, further dicussion, consensus committed, users
complain, patch backed out and issue placed on hold
* MA change - developed independently, discussed on mailing list, committed
* histogram change - filed as bug, discussed on mailing list, committed
* median change - discussed on mailing list, committed
* .npy file format - discussed and implemented at a sprint, committed

Did I miss any major ones? Of course, svn log will give you a list of
minor fixes in the last few months.

It seems to me like the review process at the moment is just discuss
it on the mailing list. Tools to facilitate that would be valuable;
it would be handy to be able to point to a particular version of the
code somewhere on the Web (rather than just in patches attached to
email), for example.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Going toward time-based release ?

2008-05-12 Thread Anne Archibald
2008/5/12 Jarrod Millman [EMAIL PROTECTED]:
 On Mon, May 12, 2008 at 12:05 PM, Eric Firing [EMAIL PROTECTED] wrote:
  As one who pushed for the MA transition, I appreciate your suggestion.
  It may have one unintended consequence, however, that may cause more
  trouble than it saves: it may lead to more situations where the ma
  versions are unintentionally mixed.  This will probably work if an
  old_ma array ends up as input for a new_ma function, but the reverse
  often will not work correctly.

 Good point.  I now agree that it is best to avoid having the old code
 accessible via np.core.ma.  If the old code is only accessible via
 np.oldnumeric, then I don't see any reason to add a deprecation
 warning since it is kind of implied by placing it in np.oldnumeric.

Does it make sense to make the *new* code available in numpy.core.ma,
optionally with a DeprecationWarning? I realize it was supposedly
never advertised as being there, but given the state of numpy
documentation many users will have found it there; at the least
matplotlib did.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Writing new ufuncs

2008-05-11 Thread Anne Archibald
Hi,

Suppose I have a C function,
double logsum(double a, double b);
What is needed to produce a ufunc object whose elementwise operation
is done by calling this function?

Also, is there a way to take a python function and automatically make
a ufunc out of it? (No, vectorize doesn't implement reduce(),
accumulate(), reduceat(), or outer().)

Thanks,
Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Writing new ufuncs

2008-05-11 Thread Anne Archibald
2008/5/11 Alan McIntyre [EMAIL PROTECTED]:
 On Sun, May 11, 2008 at 2:04 PM, Anne Archibald
 [EMAIL PROTECTED] wrote:
  Also, is there a way to take a python function and automatically make
  a ufunc out of it? (No, vectorize doesn't implement reduce(),
  accumulate(), reduceat(), or outer().)

 I've not used it, but have a look at numpy.frompyfunc; its docstring
 suggests it turns arbitrary Python functions into ufuncs.

Ah, thanks. Shame it produces object arrays though.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Writing new ufuncs

2008-05-11 Thread Anne Archibald
2008/5/11 Robert Kern [EMAIL PROTECTED]:

 Basically, you need 3 arrays: functions implementing the type-specific
 inner loops, void* extra data to pass to these functions, and an array
 of arrays containing the type signatures of the ufunc. In numpy, we
 already have generic implementations of the loop functions for common
 combinations of types. In your case, for a binary function taking two
 doubles and returning a double, we have PyUFunc_dd_d(). As its extra
 void* data, it takes a function pointer that actually implements the
 element-wise operation. So lets start making the arrays:

Great! Thanks!

Is it possible to provide a specialized implementation of reduce()?
(Since reduce() can be implemented more efficiently than doing it
pairwise.)

PyUFunc_None,  // The identity element for reduction.
   // No good one to use for this function,
   // unfortunately.

Is it safe to use minus infinity, or is this going to give people all
kinds of warnings if they have seterr() set?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Writing new ufuncs

2008-05-11 Thread Anne Archibald
2008/5/11 Robert Kern [EMAIL PROTECTED]:

 Perhaps, but ufuncs only allow 0 or 1 for this value, currently.

That's a shame, minus infinity is the identity for maximum too.

 Also, I was wrong about using PyUFunc_ff_f. Instead, use PyUFunc_ff_f_As_dd_d.

Hmm. Well, I tried implementing logsum(), and it turns out neither
PyUFunc_dd_d nor any of the other functions - mostly flagged as being
part of the ufunc API and described in the automatically-generated
ufunc_api.txt - are exported. All are static and not described in any
header I could find. That said, I know scipy defines its own ufuncs,
so it must either reimplement these or have some way I hadn't thought
of to get at them.

I've attached a patch to add the ufunc logsum to numpy. It's a bit
nasty for several reasons:
* I wasn't sure where to put it, so I put it in _compiled_base.c in
numpy/lib/src. I was very hesitant to touch umathmodule.c.src with its
sui generis macro language.
* I'm not sure what to do about log1p - it seems to be available in
spite of HAVE_LOG1P not being defined. In any case if it doesn't exist
it seems crazy to implement it again here.
* since PyUFunc_dd_d does not seem to be exported, I just cut-n-pasted
its code here. Obviously not a solution, but what are extension
writers supposed to do?

Do we want a ufunc version of logsum() in numpy? I got the impression
that in spite of some people's doubts as to its utility there was a
strong contingent of users who do want it. This plus a logdot() would
cover most of the operations one might want to do on numbers in log
representation. (Maybe subtraction?). It could be written in pure
python, for the most part, and reduce() would save a few exps and logs
at the cost of a temporary, but accumulate() remains a challenge. (I
don't expect many people feel the need for reduceat()).

Anne

P.S. it was surprisingly difficult to persuade numpy to find my tests.
I suppose that's what the proposed transition to nose is for? -A
Index: numpy/lib/src/_compiled_base.c
===
--- numpy/lib/src/_compiled_base.c	(revision 5155)
+++ numpy/lib/src/_compiled_base.c	(working copy)
@@ -1,6 +1,7 @@
 #include Python.h
 #include structmember.h
 #include numpy/noprefix.h
+#include numpy/ufuncobject.h
 
 static PyObject *ErrorObject;
 #define Py_Try(BOOLEAN) {if (!(BOOLEAN)) goto fail;}
@@ -835,16 +836,68 @@
 return;
 }
 
+#if !defined(HAVE_LOG1P)  0
+/* logsum ufunc */
+static double log1p(double x) /* is this really necessary? */
+{
+double u = 1. + x;
+if (u == 1.0) {
+return x;
+} else {
+return log(u) * x / (u-1.);
+}
+}
+#endif HAVE_LOG1P
+
+
+static double logsum(double x, double y) {
+if (xy) {
+return x+log1p(exp(y-x)); 
+} else {
+return y+log1p(exp(x-y));
+}
+};
+static void
+logsum_dd_d(char **args, intp *dimensions, intp *steps)
+{
+intp n = dimensions[0];
+intp is1 = steps[0];
+intp is2 = steps[1];
+intp os = steps[2];
+char *ip1 = args[0];
+char *ip2 = args[1];
+char *op = args[2];
+intp i;
+
+for(i = 0; i  n; i++, ip1 += is1, ip2 += is2, op += os) {
+double *in1 = (double *)ip1;
+double *in2 = (double *)ip2;
+double *out = (double *)op;
+
+*out = logsum(*in1, *in2);
+}
+}
+
+
+static char logsum_sigs[] = {
+   NPY_FLOAT64, NPY_FLOAT64, NPY_FLOAT64
+};
+static PyUFuncGenericFunction logsum_functions[] = {logsum_dd_d};
+static void* logsum_data[] = {logsum};
+static char logsum_doc[] = Compute log(exp(x)+exp(y)) without numeric overflows and underflows;
+
+
 /* Initialization function for the module (*must* be called initname) */
 
 PyMODINIT_FUNC init_compiled_base(void) {
-PyObject *m, *d, *s;
+PyObject *m, *d, *s, *f;
 
 /* Create the module and add the functions */
 m = Py_InitModule(_compiled_base, methods);
 
 /* Import the array objects */
 import_array();
+import_umath();
 
 /* Add some symbolic constants to the module */
 d = PyModule_GetDict(m);
@@ -857,6 +910,23 @@
 PyDict_SetItemString(d, error, ErrorObject);
 Py_DECREF(ErrorObject);
 
+/* set up logsum ufunc */
+   f = PyUFunc_FromFuncAndData(
+   logsum_functions,
+   logsum_data,
+   logsum_sigs,
+   1,  // The number of type signatures.
+   2,  // The number of inputs.
+   1,  // The number of outputs.
+   PyUFunc_None,  // The identity element for reduction.
+  // No good one to use for this function,
+  // unfortunately.
+   logsum,  // The name of the ufunc.
+   logsum_doc,
+   0  // Dummy for API backwards compatibility.
+   );
+   PyDict_SetItemString(d, logsum, f);
+   Py_DECREF(f);
 
 /* define PyGetSetDescr_Type and PyMemberDescr_Type */
 define_types();
Index: numpy/lib/tests/test_ufunclike.py

Re: [Numpy-discussion] Uncomfortable with matrix change

2008-05-10 Thread Anne Archibald
2008/5/10 Jarrod Millman [EMAIL PROTECTED]:
 On Sat, May 10, 2008 at 8:14 AM, Keith Goodman [EMAIL PROTECTED] wrote:
 If these are backed out, will some kind of deprecation
 warning be added for scalar indexing, as Travis suggested?
 Robert's request seems in accord with this.

 Shouldn't a deprecation warning explain what the future behavior will
 be? Is there a firm consensus on what that behavior will be?

 I agree that a deprecation warning needs to explain the future
 behavior and don't believe we have agreed on what it should be yet.

 I don't personally have an opinion on the what the new behavior should
 be at this point.  But I don't think it makes sense to add deprecation
 warnings at this point--unless we know exactly what it is that we will
 be doing in the future.  So while it could be argued that it would be
 useful to say use x[0,:] instead of x[0] to return row 0 as a matrix
 in the 1.1 release, my sense is that we will still need to replace
 that DeprecationWarning in 1.2 with a new DeprecationWarning saying
 something like in 1.3 x[0] will return  and if you want  you
 will need to call x[0,:].  I don't think we should have two different
 DeprecationWarnings in back to back releases and believe that we
 should wait to include the warning until we decide what the new
 behavior will be in 1.2.

Regrettably I have to agree that we can't introduce a
DeprecationWarning until 1.2. I don't agree that x[0,:] should return
a matrix! It should be the one-dimensional object it looks like. See
Keith Goodman's recent message -
http://projects.scipy.org/pipermail/numpy-discussion/2008-May/033726.html
- for an example of why representing 1D objects as 2D objects is
trouble. I am of the opinion that matrix indexing should be identical
to array indexing, insofar as that is possible.

I don't expect my opinion to prevail, but the point is that we do not
even have enough consensus to agree on a recommendation to go in the
DeprecationWarning. Alas.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Uncomfortable with matrix change

2008-05-10 Thread Anne Archibald
2008/5/10 Nathan Bell [EMAIL PROTECTED]:
 On Sat, May 10, 2008 at 3:05 PM, Anne Archibald
 [EMAIL PROTECTED] wrote:

 I don't expect my opinion to prevail, but the point is that we do not
 even have enough consensus to agree on a recommendation to go in the
 DeprecationWarning. Alas.

 Would you object to raising a general Warning with a message like the 
 following?

 matrix indexing of the form x[0] is ambiguous, consider the explicit
 format x[0,:]

Well, since I at least have proposed changing the latter as well, the
warning doesn't actually help with deprecation. It does reduce the
amount of beginner foot-shooting (as in x[0][0]), I guess, so I'm not
opposed to it.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Uncomfortable with matrix change

2008-05-10 Thread Anne Archibald
2008/5/10 Timothy Hochberg [EMAIL PROTECTED]:
 On Sat, May 10, 2008 at 1:37 PM, Anne Archibald [EMAIL PROTECTED]
 wrote:

 2008/5/10 Nathan Bell [EMAIL PROTECTED]:
  On Sat, May 10, 2008 at 3:05 PM, Anne Archibald
  [EMAIL PROTECTED] wrote:
 
  I don't expect my opinion to prevail, but the point is that we do not
  even have enough consensus to agree on a recommendation to go in the
  DeprecationWarning. Alas.
 
  Would you object to raising a general Warning with a message like the
  following?
 
  matrix indexing of the form x[0] is ambiguous, consider the explicit
  format x[0,:]

 Well, since I at least have proposed changing the latter as well, the
 warning doesn't actually help with deprecation. It does reduce the
 amount of beginner foot-shooting (as in x[0][0]), I guess, so I'm not
 opposed to it.

 Please, let's just leave the current matrix class alone. Any change
 sufficient to make matrix not terrible, will break everyone's code. Instead,
 the goal should be build a new matrix class (say newmatrix) where we can
 start from scratch. We can build it and keep it alpha for a while so we can
 break the interface as needed while we get experience with it and then,
 maybe, transition matrix to oldmatrx and newmatrix to matrix over a couple
 of releases. Or just choose a reasonable name for matrix to begin with
 (ndmatrix?) and then deprecate the current matrix class when the new one is
 fully baked.

 Trying to slowly morph matrix into something usable is just going to break
 code again and again and just be a giant headache in general.

+1

I think something along the lines of the Numerical Ruby code you
pointed to some time ago is a good idea, that is, an array of
matrices object and an array of vectors object. There will be
considerable challenges - distinguishing the two kinds of index
manipulations on arrays of matrices (those that transpose every matrix
in the array, for eample, and those that rearrange the array of
matrices). Since teaching is one of the main current applications of
matrices, these should be presented in as non-painful a way as
possible.

Anyway, I think I am in favour of returning the 1.1 matrix behaviour
to its 1.0 form and releasing the thing.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Uncomfortable with matrix change

2008-05-09 Thread Anne Archibald
2008/5/9 Travis Oliphant [EMAIL PROTECTED]:

 After Nathan Bell's recent complaints, I'm a bit more uncomfortable with
 the matrix change to scalar indexing.   It does and will break code in
 possibly hard-to-track down ways.   Also, Nathan has been a *huge*
 contributor to the Sparse matrix in scipy and so I value his opinion
 about the NumPy matrix.  One of my goals is to have those two objects
 work together a bit more seamlessly.

 So, I think we need to:

 1) Add a warning to scalar access
 2) Back-out the change and fix all the places where NumPy assumes
 incorrectly that the number of dimensions reduce on PySequence_GetItem.

 Opinions?

This is certainly the conservative approach.

How much code is broken by this, compared to (say) the amount broken
by the disappearance of numpy.core.ma? Is this our biggest single API
breakage?

I do agree that we should be paying attention to people who are
actually using matrices, so I won't enter a vote.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in oldnumeric.ma

2008-05-09 Thread Anne Archibald
2008/5/9 Eric Firing [EMAIL PROTECTED]:
 Stefan, (and Jarrod and Pierre)

  (Context for anyone new to the thread: the subject is slightly
  misleading, because the bug is/was present in both oldnumeric.ma and
  numpy.ma; the discussion of fix pertains to the latter only.)

  Regarding your objections to r5137: good point.  I wondered about that.
   I think the function should look like this (although it might be
  possible to speed up the implementation for the most common case):
[...]
  md = make_mask((fb != fb.astype(int))  (fa  0), shrink=True)

Unfortunately this isn't quite the right condition:

In [18]: x = 2.**35; numpy.array([-1.])**x; numpy.array(x).astype(int)==x
Out[18]: array([ 1.])
Out[18]: False

Switching to int64 seems to help:

In [27]: x = 2.**62; numpy.array([-1.])**x;
numpy.array(x).astype(numpy.int64)==x
Out[27]: array([ 1.])
Out[27]: True

This seems to work, but may be platform-dependent: 2**62+1 cannot be
represented as an IEEE float, so whether pow() successfully deals with
it may be different for machines that don't work with 80-bit
floating-point internally.

A suspenders-and-belt approach would check for NaNs and add them to
the mask, but this still doesn't cover the case where the user has
numpy set to raise exceptions any time NaNs are generated.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Power domain (was Re: bug in oldnumeric.ma)

2008-05-09 Thread Anne Archibald
2008/5/9 Eric Firing [EMAIL PROTECTED]:

 It seems like some strategic re-thinking may be needed in the long run,
 if not immediately.  There is a wide range of combinations of arguments
 that will trigger invalid results, whether Inf or NaN.  The only way to
 trap and mask all of these is to use masked_invalid after the
 calculation, and this only works if the user has not disabled nan
 output.  I have not checked recently, but based on earlier strategy
 discussions, I suspect that numpy.ma is already strongly depending on
 the availability of nan and inf output to prevent exceptions being
 raised upon invalid calculations.  Maybe this should simply be
 considered a requirement for the use of ma.

I think in principle the right answer is to simply run whatever
underlying function, and mask any NaNs or Infs in the output. This may
be a problem when it comes to seterr - is the behaviour of seterr()
even defined in a multithreaded context? Can it be made thread-local?
Surely hardware must support thread-local floating-point error flags.
If seterr() works this way, then surely the right answer in ma is to
use a try/finally block to turn off exceptions and warnings, and clean
up NaNs after the fact.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Anne Archibald
2008/5/8 David Cournapeau [EMAIL PROTECTED]:
 On Thu, May 8, 2008 at 10:20 PM, Charles R Harris
  [EMAIL PROTECTED] wrote:
  
  
   Floating point numbers are essentially logs to base 2, i.e., integer
   exponent and mantissa between 1 and 2. What does using the log buy you?

  Precision, of course. I am not sure I understand the notation base =
  2, but doing computation in the so called log-domain is a must in many
  statistical computations. In particular, in machine learning with
  large datasets, it is common to have some points whose pdf is
  extremely small, and well below the precision of double. Typically,
  internally, the computation of my EM toolbox are done in the log
  domain, and use the logsumexp trick to compute likelihood given some
  data:

I'm not sure I'd describe this as precision, exactly; it's an issue of
numerical range. But yes, I've come across this while doing
maximum-likelihood fitting, and a coworker ran into it doing Bayesian
statistics. It definitely comes up.

Is logarray really the way to handle it, though? it seems like you
could probably get away with providing a logsum ufunc that did the
right thing. I mean, what operations does one want to do on logarrays?

add - logsum
subtract - ?
multiply - add
mean - logsum/N
median - median
exponentiate to recover normal-space values - exp
str - ?

I suppose numerical integration is also valuable, so it would help to
have a numerical integrator that was reasonably smart about working
with logs. (Though really it's just a question of rescaling and
exponentiating, I think.)

A logarray type would help by keeping track of the fact that its
contents were in log space, and would make expressions a little less
cumbersome, I guess. How much effort would it take to write it so that
it got all the corner cases right?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Anne Archibald
2008/5/8 Charles R Harris [EMAIL PROTECTED]:

 What realistic probability is in the range exp(-1000) ?

Well, I ran into it while doing a maximum-likelihood fit - my early
guesses had exceedingly low probabilities, but I needed to know which
way the probabilities were increasing.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Anne Archibald
2008/5/8 Charles R Harris [EMAIL PROTECTED]:

 On Thu, May 8, 2008 at 10:56 AM, Robert Kern [EMAIL PROTECTED] wrote:
 
  When you're running an optimizer over a PDF, you will be stuck in the
  region of exp(-1000) for a substantial amount of time before you get
  to the peak. If you don't use the log representation, you will never
  get to the peak because all of the gradient information is lost to
  floating point error. You can consult any book on computational
  statistics for many more examples. This is a long-established best
  practice in statistics.

 But IEEE is already a log representation. You aren't gaining precision, you
 are gaining more bits in the exponent at the expense of fewer bits in the
 mantissa, i.e., less precision. As I say, it may be convenient, but if cpu
 cycles matter, it isn't efficient.

Efficiency is not the point here. IEEE floats simply cannot represent
the difference between exp(-1000) and exp(-1001). This difference can
matter in many contexts.

For example, suppose I have observed a source in X-rays and I want to
fit a blackbody spectrum. I have, say, hundreds of spectral bins, with
a number of photons in each. For any temperature T and amplitude A, I
can compute the distribution of photons in each bin (Poisson with mean
depending on T and A), and obtain the probability of obtaining the
observed number of photons in each bin. Even if a blackbody with
temperature T and amplitude A is a perfect fit I should expect this
number to be very small, since the chance of obtaining *exactly* this
sequence of integers is quite small. But when I start the process I
need to guess a T and an A and evauate the probability. If my values
are far off, the probability will almost certainly be lower than
exp(-1000). But I need to know whether, if I increase T, this
probability will increase or decrease, so I cannot afford to treat it
as zero, or throw up my hands and say This is smaller than one over
the number of baryons in the universe! My optimization problem doesn't
make any sense!.

I could also point out that frequently when one obtains these crazy
numbers, one is not working with probabilities at all, but probability
densities. A probability density of exp(-1000) means nothing special.

Finally, the fact that *you* don't think this is a useful technique
doesn't affect the fact that there is a substantial community of users
who use it daily and who would like some support for it in scipy.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Anne Archibald
2008/5/8 Charles R Harris [EMAIL PROTECTED]:

 David, what you are using is a log(log(x)) representation internally. IEEE
 is *not* linear, it is logarithmic.

As Robert Kern says, yes, this is exactly what the OP and all the rest
of us want.

But it's a strange thing to say that IEEE is logarithmic - 2.3*10**1
is not exactly logarithmic, since the 2.3 is not the logarithm of
anything. IEEE floats work the same way, which is important, since it
means they can exactly represent integers of moderate size. For
example, 257 is represented as

sign 0, exponent 135, (implied leading 1).0001b.

The exponent is indeed the integer part of the log base 2 of the
value, up to some fiddling, but the mantissa is not a logarithm of any
kind.

Anyway, all this is immaterial. The point is, in spite of the fact
that floating-point numbers can represent a very wide range of
numbers, there are some important contexts in which this range is not
wide enough. One could in principle store an additional power of two
in an accompanying integer, but this would be less efficient in terms
of space and time, and more cumbersome, when for the usual situations
where this is applied, simply taking the logarithm works fine.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Log Arrays

2008-05-08 Thread Anne Archibald
2008/5/8 T J [EMAIL PROTECTED]:
 On 5/8/08, Anne Archibald [EMAIL PROTECTED] wrote:
   Is logarray really the way to handle it, though? it seems like you
   could probably get away with providing a logsum ufunc that did the
   right thing. I mean, what operations does one want to do on logarrays?
  
   add - logsum
   subtract - ?
   multiply - add
   mean - logsum/N
   median - median
   exponentiate to recover normal-space values - exp
   str - ?
  

  That's about it, as far as my usage goes.  Additionally, I would also
  benefit from:

logdot
logouter

  In addition to the elementwise operations, it would be nice to have

logsum along an axis
logprod along an axis
cumlogsum
cumlogprod

  Whether these are through additional ufuncs or through a subclass is
  not so much of an issue for me---either would be a huge improvement to
  the current situation.  One benefit of a subclass, IMO, is that it
  maintains the feel of non-log arrays.  That is, when I want to
  multiply to logarrays, I just do x*y, rather than x+ybut I can
  understand arguments that this might not be desirable.

Well, how about a two-step process? Let's come up with a nice
implementation of each of the above, then we can discuss whether a
subclass is warranted (and at that point each operation on the
subclass will be very simple to implement).

Most of them could be implemented almost for free by providing a ufunc
that did logsum(); then logsum along an axis becomes logsum.reduce(),
and cumlogsum becomes logsum.accumulate(). logprod is of course just
add. logouter is conveniently and efficiently written as
logprod.outer. logdot can be written in terms of logsum() and
logprod() at the cost of a sizable temporary, but should really have
its own implementation.

It might be possible to test this by using vectorize() on a
single-element version of logsum(). This would be slow but if
vectorize() makes a real ufunc object should provide all the usual
handy methods (reduce, accumulate, outer, etcetera). Alternatively,
since logsum can be written in terms of existing ufuncs, it should be
possible to implement a class providing all the ufunc methods by hand
without too too much pain.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Why are ufunc docstrings useless?

2008-05-08 Thread Anne Archibald
Hi,

I frequently use functions like np.add.reduce and np.add.outer, but
their docstrings are totally uninformative. Would it be possible to
provide proper docstrings for these ufunc methods? They need not be
specific to np.add; just an explanation of what arguments to give (for
example) reduce() (presumably it takes an axis= argument? what is the
default behaviour?) and what it does would help.

Thanks,
Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Why are ufunc docstrings useless?

2008-05-08 Thread Anne Archibald
2008/5/8 Robert Kern [EMAIL PROTECTED]:
 On Thu, May 8, 2008 at 5:28 PM, Anne Archibald
 [EMAIL PROTECTED] wrote:
 Hi,

 I frequently use functions like np.add.reduce and np.add.outer, but
 their docstrings are totally uninformative. Would it be possible to
 provide proper docstrings for these ufunc methods? They need not be
 specific to np.add; just an explanation of what arguments to give (for
 example) reduce() (presumably it takes an axis= argument? what is the
 default behaviour?) and what it does would help.

 Sure. The place to add them would be in the PyMethodDef ufunc_methods
 array on line 3953 of numpy/core/src/ufuncobject.c. Just extend each
 of the rows with a char* containing the docstring contents. A literal
 string will suffice.

Thanks! Done add, reduce, outer, and reduceat. What about __call__?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Why are ufunc docstrings useless?

2008-05-08 Thread Anne Archibald
2008/5/8 Robert Kern [EMAIL PROTECTED]:
 On Thu, May 8, 2008 at 9:52 PM, Anne Archibald
 [EMAIL PROTECTED] wrote:

 Thanks! Done add, reduce, outer, and reduceat. What about __call__?

 If anyone knows enough to explicitly request a docstring from
 __call__, they already know what it does.

How exactly are they to find out? It does take additional arguments,
for example dtype and out - I think.

Also, help(np.add) displays the the object, its docstring, its
methods, and all their docstrings. So it provides a way to get a
docstring out of __call__ without having to know what it is.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Why are ufunc docstrings useless?

2008-05-08 Thread Anne Archibald
2008/5/8 Robert Kern [EMAIL PROTECTED]:
 On Thu, May 8, 2008 at 10:39 PM, Anne Archibald
 [EMAIL PROTECTED] wrote:
 2008/5/8 Robert Kern [EMAIL PROTECTED]:

 If anyone knows enough to explicitly request a docstring from
 __call__, they already know what it does.

 How exactly are they to find out? It does take additional arguments,
 for example dtype and out - I think.

 That should be recorded in the ufunc's main docstring, e.g.
 numpy.add.__doc__, since that is what people will actually be calling.
 No one will explicitly call numpy.add.__call__(x,y).

 Also, help(np.add) displays the the object, its docstring, its
 methods, and all their docstrings. So it provides a way to get a
 docstring out of __call__ without having to know what it is.

 Meh. All it can usefully say is Refer to the main docstring. Which
 is more or less what it currently says.

So is the custom that double-underscore methods get documented in the
class docstring and normal methods get documented in their own
docstrings?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Status Report for NumPy 1.1.0

2008-05-07 Thread Anne Archibald
2008/5/7 Jarrod Millman [EMAIL PROTECTED]:
 I have just created the 1.1.x branch:
  http://projects.scipy.org/scipy/numpy/changeset/5134
  In about 24 hours I will tag the 1.1.0 release from the branch.  At
  this point only critical bug fixes should be applied to the branch.
  The trunk is now open for 1.2 development.

I committed Travis' matrix indexing patch (plus a basic test) to
1.1.x. Hope that's okay. (All tests pass.)

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in oldnumeric.ma

2008-05-07 Thread Anne Archibald
2008/5/7 Eric Firing [EMAIL PROTECTED]:
 Charles Doutriaux wrote:
   The following code works with numpy.ma but not numpy.oldnumeric.ma,

  No, this is a bug in numpy.ma also; power is broken:

While it's tempting to just call power() and mask out any NaNs that
result, that's going to be a problem if people have their environments
set to raise exceptions on the production of NaNs. Is it an adequate
criterion to check (a0)  (round(b)==b)? We have to be careful:

In [16]: np.array([-1.0])**(2.0**128)
Warning: invalid value encountered in power
Out[16]: array([  nan])

2.0**128 cannot be distinguished from nearby non-integral values, so
this is reasonable behaviour (and a weird corner case), but

In [23]: np.round(2.0**128) == 2.0**128
Out[23]: True

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in oldnumeric.ma

2008-05-07 Thread Anne Archibald
2008/5/7 Pierre GM [EMAIL PROTECTED]:
 All,
  Yes, there is a problem with ma.power: masking negative data should be
  restricted to the case of an exponent between -1. and 1. only, don't you
  think ?

No, there's a problem with any fractional exponent (with even
denominator): x**(3/2) == (x**3)**(1/2). And of course in a
floating-point world, you can't really ask whether the denominator is
even or not. So any non-integer power is trouble.

The draconian approach would be to simply disallow negative numbers to
be raised to exponents of type float, but that's going to annoy a lot
of people who have integers which happen to be represented in floats.
(Representing integers in floats is exact up to some fairly large
value.) So the first question is how do we recognize floats with
integer values?. Unfortunately that's not the real problem. The real
problem is how do we predict when power() is going to produce a NaN?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in oldnumeric.ma

2008-05-07 Thread Anne Archibald
2008/5/7 Jonathan Wright [EMAIL PROTECTED]:

  Is there a rule against squaring away the negatives?

  def not_your_normal_pow( x, y ): return exp( log( power( x, 2) ) * y / 2 )

  Which still needs some work for x==0.

Well, it means (-1.)**(3.) becomes 1., which is probably not what the
user expected...

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Status Report for NumPy 1.1.0

2008-05-07 Thread Anne Archibald
2008/5/7 Charles R Harris [EMAIL PROTECTED]:

 On Wed, May 7, 2008 at 11:44 AM, Anne Archibald [EMAIL PROTECTED]
 wrote:
  2008/5/7 Jarrod Millman [EMAIL PROTECTED]:
 
   I have just created the 1.1.x branch:
http://projects.scipy.org/scipy/numpy/changeset/5134
In about 24 hours I will tag the 1.1.0 release from the branch.  At
this point only critical bug fixes should be applied to the branch.
The trunk is now open for 1.2 development.
 
  I committed Travis' matrix indexing patch (plus a basic test) to
  1.1.x. Hope that's okay. (All tests pass.)
 

 You should put it in the 1.2 branch also, then. I think fancy indexing and
 tolist, which Travis mentioned as having matrix workarounds, need to be
 checked to see if they still work correctly for matrices with the patch.

Ah. Good point. I did find a bug - x[:,0] doesn't do what you'd
expect. Best not release without either backing out my change. I'm
still trying to track down what's up.

Guess our test suite leaves something to be desired.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Status Report for NumPy 1.1.0

2008-05-07 Thread Anne Archibald
2008/5/7 Charles R Harris [EMAIL PROTECTED]:

 On Wed, May 7, 2008 at 5:31 PM, Anne Archibald [EMAIL PROTECTED]
 wrote:
  Ah. Good point. I did find a bug - x[:,0] doesn't do what you'd
  expect. Best not release without either backing out my change. I'm
  still trying to track down what's up.

 Returns a column matrix here, using  1.2.0.dev5143 after Travis's commits.
 What should it do?

Oh, uh, my bad. I messed up my test code, and panicked thinking Jarrod
was about to release a borken 1.1.0 and it was going to be All My
Fault.

That said, the only explicit test of .tolist() that I could find is
the one I just wrote...

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Status Report for NumPy 1.1.0

2008-05-07 Thread Anne Archibald
2008/5/7 Charles R Harris [EMAIL PROTECTED]:

 Heh, I just added some tests to 1.2 before closing ticket #707. They  should
 probably be merged with yours.

Seems a shame:
Ran 1000 tests in 5.329s

Such a nice round number!

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deprecating PyDataMem_RENEW ?

2008-05-06 Thread Anne Archibald
2008/5/5 David Cournapeau [EMAIL PROTECTED]:

  Basically, what I have in mind is, in a first step (for numpy 1.2):
 - define functions to allocate on a given alignement
 - make PyMemData_NEW 16 byte aligned by default (to be compatible
  with SSE and co).

  The problem was, and still is, realloc. It is not possible to implement
  realloc with malloc/free (in a portable way), and as such, it is not
  possible to have an aligned realloc.

How much does this matter? I mean, what if you simply left all the
reallocs as-is? The arrays that resulted from reallocs would not
typically be aligned, but we cannot in any case expect all arrays to
be aligned.

Or, actually, depending on how you manage the alignment, it may be
possible to write a portable aligned realloc. As I understand it, a
portable aligned malloc allocates extra memory, then adds something to
the pointer to make the memory arena aligned. free() must then recover
the original pointer and call free() on it - though this can
presumably be avoided if garbage collection is smart enough. realloc()
also needs to recover the pointer to call the underlying realloc().
One answer is to ensure that at least one byte of padding is always
done at the beginning, so that the number of pad bytes used on an
aligned pointer p is accessible as ((uint8*)p)[-1]. Other schemes are
available; I have a peculiar affection for the pure-python solution of
constructing a view. In any of them, it seems to me, if free() can
recover the original pointer, so can realloc()...

I don't think any numpy function can assume that its input arrays are
aligned, since users may like to pass, say, mmap()ed hunks of a file,
over which numpy has no control of the alignment. In this case, then,
how much of a problem is it if a few stray realloc()s produce
non-aligned arrays?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Tests for empty arrays

2008-05-06 Thread Anne Archibald
2008/5/6 Andy Cheesman [EMAIL PROTECTED]:

  I was wondering if anyone could shed some light on how to distinguish an
  empty array of a given shape and an zeros array of the same dimensions.

An empty array, that is, an array returned by the function empty(),
just means an uninitialized array. The only difference between it and
an array returned by zeros() is the values in the array: from zeros()
they are, obviously, all zero, while from empty() they can be
anything, including zero(). In practice they tend to be crazy numbers
of order 1e300 or 1e-300, but one can't count on this.

As a general rule, I recommend against using empty() on your first
pass through the code. Only once your code is working and you have
tests running should you consider switching to empty() for speed
reasons. In fact, if you want to use empty() down the road, it may
make sense to initialize your array to zeros()/0., so that if you ever
use the values, the NaNs will propagate and become obvious.
Unfortunately, some CPUs implement NaNs in software, so there can be a
huge speed hit.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Status Report for NumPy 1.1.0

2008-05-06 Thread Anne Archibald
2008/5/6 Charles R Harris [EMAIL PROTECTED]:

 On Tue, May 6, 2008 at 6:40 AM, Alan G Isaac [EMAIL PROTECTED] wrote:
 
  On Tue, 6 May 2008, Jarrod Millman apparently wrote:
   open tickets that I would like everyone to take a brief
   look at:
   http://projects.scipy.org/scipy/numpy/ticket/760
 
  My understanding is that my patch, which would give
  a deprecation warning, was rejected in favor of the patch
  specified at the end of Travis's message here:
 
 URL:http://projects.scipy.org/pipermail/numpy-discussion/2008-April/033315.html
 
  At least that's how I thought things were resolved.
  I expected Travis's patch would be part of the 1.1 release.
 

 I think someone needs to step up and make the change, but first it needs to
 be blessed by Travis and some of the folks on the Matrix indexing thread.
 The change might also entail removing some workarounds in the spots
 indicated by Travis.

It has my vote, as a peripheral participant in the matrix indexing
thread; in fact it's what some of us thought we were agreeing to
earlier.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] lexsort

2008-05-06 Thread Anne Archibald
2008/5/6 Eleanor [EMAIL PROTECTED]:
  a = numpy.array([[1,2,6], [2,2,8], [2,1,7],[1,1,5]])
   a
  array([[1, 2, 6],
[2, 2, 8],
[2, 1, 7],
[1, 1, 5]])
   indices = numpy.lexsort(a.T)
   a.T.take(indices,axis=-1).T
  array([[1, 1, 5],
[1, 2, 6],
[2, 1, 7],
[2, 2, 8]])


  The above does what I want, equivalent to sorting on column A then
  column B in Excel, but al the transposes are ungainly. I've stared at it a 
 while
  but can't come up with a more elegant solution. Any ideas?

It appears that lexsort is broken in several ways, and its docstring
is misleading.

First of all, this code is not doing quite what you describe. The
primary key here is the [5,6,7,8] column, followed by the middle and
then by the first. This is almost exactly the opposite of what you
describe (and of what I expected). To get this to sort the way you
describe, the clearest way is to write a sequence:

In [34]: indices = np.lexsort( (a[:,1],a[:,0]) )

In [35]: a[indices,:]
Out[35]:
array([[1, 1, 5],
   [1, 2, 6],
   [2, 1, 7],
   [2, 2, 8]])

In other words,sort over a[:,1], then sort again over a[:,0], making
a[:,0] the primary key. I have used fancy indexing to pull the array
into the right order, but it can also be done with take():

In [40]: np.take(a,indices,axis=0)
Out[40]:
array([[1, 1, 5],
   [1, 2, 6],
   [2, 1, 7],
   [2, 2, 8]])


As for why I say lexsort() is broken, well, it simply returns 0 for
higher-rank arrays (rather than either sorting or raising an
exception), it raises an exception when handed axis=None rather than
flattening as the docstring claims, and whatever the axis argument is
supposed to do, it doesn't seem to do it:

In [44]: np.lexsort(a,axis=0)
Out[44]: array([1, 0, 2])

In [45]: np.lexsort(a,axis=-1)
Out[45]: array([1, 0, 2])

In [46]: np.lexsort(a,axis=1)
---
type 'exceptions.ValueError'Traceback (most recent call last)

/home/peridot/ipython console in module()

type 'exceptions.ValueError': axis(=1) out of bounds


Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deprecating PyDataMem_RENEW ?

2008-05-05 Thread Anne Archibald
2008/5/5 Robert Kern [EMAIL PROTECTED]:
 On Mon, May 5, 2008 at 7:44 AM, David Cournapeau
  [EMAIL PROTECTED] wrote:

   In numpy, we can always replace realloc by malloc/free, because we know
the size of the old block: would deprecating PyMemData_RENEW and
replacing them by PyMemeData_NEW/PyMemData_FREE be possible, such as to
make all numpy arrays follow a default alignement ? There are only a few
of them in numpy (6 of them), 0 in scipy, and I guess extensions never
really used them ?

  I am in favor of at least trying this out. We will have to have a set
  of benchmarks to make sure we haven't hurt the current uses of
  PyMemData_RENEW which Tim points out.

realloc() may not be a performance win anyway. Allocating new memory
is quite fast, copying data is quite fast, and in-place realloc()
tends to cause memory fragmentation - take an arena full of 1024-byte
blocks and suddenly make one of them 256 bytes long and you haven't
gained anything at all in most malloc() implementations. So yes,
benchmarks.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Broadcasting question

2008-05-02 Thread Anne Archibald
2008/5/2 Chris.Barker [EMAIL PROTECTED]:
 Hi all,

  I have a n X m X 3 array, and I a n X M array. I want to assign the
  values in the n X m to all three of the slices in the bigger array:

  A1 = np.zeros((5,4,3))
  A2 = np.ones((5,4))
  A1[:,:,0] = A2
  A1[:,:,1] = A2
  A1[:,:,2] = A2

  However,it seems I should be able to broadcast that, so I don't have to
  repeat myself, but I can't figure out how.

All you need is a newaxis on A2:
In [2]: A = np.zeros((3,4))

In [3]: B = np.ones(3)

In [4]: A[:,:] = B[:,np.newaxis]

In [5]: A
Out[5]:
array([[ 1.,  1.,  1.,  1.],
   [ 1.,  1.,  1.,  1.],
   [ 1.,  1.,  1.,  1.]])


Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combining Sigmoid Curves

2008-05-02 Thread Anne Archibald
2008/5/2 Rich Shepard [EMAIL PROTECTED]:

What will work (I call it a pi curve) is a matched pair of sigmoid curves,
  the ascending curve on the left and the descending curve on the right. Using
  the Boltzmann function for these I can calculate and plot each individually,
  but I'm having difficulty in appending the x and y values across the entire
  range. This is where I would appreciate your assistance.

It's better not to work point-by-point, appending things, when working
with numpy. Ideally you could find a formula which just produced the
right curve, and then you'd apply it to the input vector and get the
output vector all at once. For example for a Gaussian (which you're
not using, I know), you'd write a function something like

def f(x):
return np.exp(-x**2)

and then call it like:

f(np.linspace(-1,1,1000)).

This is efficient and clear. It does not seem to me that the logistic
function, 1/(1+np.exp(x)) does quite what you want. How about using
the cosine?

def f(left, right, x):
scaled_x = (x-(right+left)/2)/((right-left)/2)
return (1+np.cos((np.pi/2) * scaled_x))/2

exactly zero at both endpoints, exactly one at the midpoint,
inflection points midway between, where the value is 1/2. If you want
to normalize it so that the area underneath is one, that's easy to do.

More generally, the trick of producing a scaled_x as above lets you
move any function anywhere you like.

If you want the peak not to be at the midpoint of the interval, you
will need to do something a litle more clever, perhaps choosing a
different function, scaling the x values with a quadratic polynomial
so that (left, middle, right) becomes (-1,0,1), or using a piecewise
function.

Good luck,
Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combining Sigmoid Curves

2008-05-02 Thread Anne Archibald
2008/5/2 Rich Shepard [EMAIL PROTECTED]:
 On Fri, 2 May 2008, Anne Archibald wrote:

   It's better not to work point-by-point, appending things, when working
   with numpy. Ideally you could find a formula which just produced the right
   curve, and then you'd apply it to the input vector and get the output
   vector all at once.

  Anne,

That's been my goal. :-)


   How about using the cosine?
  
   def f(left, right, x):
  scaled_x = (x-(right+left)/2)/((right-left)/2)
  return (1+np.cos((np.pi/2) * scaled_x))/2
  
   exactly zero at both endpoints, exactly one at the midpoint,
   inflection points midway between, where the value is 1/2. If you want
   to normalize it so that the area underneath is one, that's easy to do.

   More generally, the trick of producing a scaled_x as above lets you
   move any function anywhere you like.

This looks like a pragmatic solution. When I print scaled_x (using left =
  0.0 and right = 100.0), the values range from -1.0 to +0.998. So, I need to
  figure out the scale_x that sets the end points at 0 and 100.

No, no. You *want* scaled_x to range from -1 to 1. (The 0.998 is
because you didn't include the endpoint, 100.) The function I gave,
(1+np.cos((np.pi/2) * scaled_x))/2, takes [-1, 1] to a nice
bump-shaped function.  If you feed in numbers from 0 to 100 as x, they
get transformed to scaled_x, and you feed them to the function,
getting a result that goes from 0 at x=0 to 1 at x=50 to 0 at x=100.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] should outer take an output argument?

2008-04-30 Thread Anne Archibald
2008/4/29 Alan G Isaac [EMAIL PROTECTED]:
 As I was looking at Bill's conjugate gradient posting,
  I found myself wondering if there would be a payoff
  to an output argument for ``numpy.outer``.  (It is fairly
  natural to repeatedly recreate the outer product of
  the adjusted residuals, which is only needed during a
  single iteration.)

I avoid np.outer, as it seems designed for foot shooting. (It forcibly
flattens both arguments into vectors no matter what shape they are and
then computes their outer product.) np.multiply.outer does (what I
consider to be) the right thing. Not that it takes an output argument
either, as far as I can tell.

But trying to inspect it suggests a few questions:

* Why are ufunc docstrings totally useless?
* Why are output arguments not keyword arguments?

As for your original question, yes, I think it would be good if
.outer() and .reduce() methods of ufuncs took output arguments. This
would let one use add.reduce() as a poor man's sum() even when dealing
with uint8s, for example. (Coming up with a problem for which
subtract.reduce(), divide.reduce(), or arctan2.reduce() are the
solutions is left as an exercise for the reader.) I can definitely see
a use for divide.outer() : Int - Int - Float, though.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] very simple iteration question.

2008-04-30 Thread Anne Archibald
2008/4/30 a g [EMAIL PROTECTED]:
 Hi.  This is a very basic question, sorry if it's irritating.  If i
  didn't find the answer written already somewhere on the site, please
  point me to it.  That'd be great.

  OK: how do i iterate over an axis other than 0?

  I have a 3D array of data[year, week, location].  I want to iterate
  over each year at each location and run a series of stats on the
  columns (on the 52 weeks in a particular year at a particular location).
   'for years in data:' will get the first one, but then how do i not
  iterate over the 1 axis and iterate over the 2 axis instead?

Well, there's always

for i in xrange(A.shape[1]):
do_something(A[:,i,...])

But that's kind of ugly in this day and age. I would be tempted by

for a in np.rollaxis(A,1):
do_something(a)

It's worth mentioning that traversing an array along an axis not the
first usually results in subarrays that are not contiguous in memory.
While numpy makes working with these convenient, they may not be very
efficient for cache reasons (for example, modern processors load from
memory - an extremely slow operation - in blocks that may be as large
as 64 bytes; if you only use eight of these before moving on, your
code will use much more memory bandwidth than it needs to). If this is
a concern, you can usually transparently rearrange your array's memory
layout to avoid it.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] very simple iteration question.

2008-04-30 Thread Anne Archibald
2008/4/30 Christopher Barker [EMAIL PROTECTED]:
 a g wrote:
   OK: how do i iterate over an axis other than 0?

  This ties in nicely with some of the discussion about interating over
  matrices. It ahs been suggested that it would be nice to have iterators
  for matrices, so you could do:

  for row in M.rows:
...

  and

  for column in M.cols:
...


  If so, then wouldn't it make sense to have built in iterators for
  nd-arrays as well? something like:

  for subarray in A.iter(axis=i):
 ...

  where axis would default to 0.

  This is certainly cleaner than:

  for j in range(A.shape[i]):
  subarray = A[:,:,j,:]

  Wait! I have no idea how to spell that generically -- i.e. the ith
  index, where i is a variable. There must be a way to build a slice
  object dynamically, but I don't know it.

Slices can be built without too much trouble using slice(), but it's
much easier to just write

for subarray in np.rollaxis(A,i):
...

rollaxis() just pulls the specified axis to the front. (It doesn't do
what I thought it would, which is a cyclic permutation of the axes).

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] untenable matrix behavior in SVN

2008-04-30 Thread Anne Archibald
2008/4/30 Charles R Harris [EMAIL PROTECTED]:

 Some operations on stacks of small matrices are easy to get, for instance,
 +,-,*,/, and matrix multiply. The last is the interesting one. If A and B
 are stacks of matrices with the same number of dimensions with the matrices
 stored in the last two indices, then

 sum(A[...,:,:,newaxis]*B[...,newaxis,:,:], axis=-2)

 is the matrix-wise multiplication of the two stacks. If B is replaced by a
 stack of 1D vectors, x, it is even simpler:

 sum(A[...,:,:]*x[...,newaxis,:], axis=-1)

 This doesn't go through BLAS, but for large stacks of small matrices it
 might be even faster than BLAS because BLAS is kinda slow for small
 matrices.

Yes and no. For the first operation, you have to create a temporary
that is larger than either of the two input arrays. These invisible
(potentially) gigantic temporaries are the sort of thing that puzzle
users when as their problem size grows they suddenly find they hit a
massive slowdown because it starts swapping to disk, and then a
failure because the temporary can't be allocated. This is one reason
we have dot() and tensordot() even though they can be expressed like
this. (The other is of course that it lets us use optimized BLAS.)


This rather misses the point of Timothy Hochberg's suggestion (as I
understood it), though: yes, you can write the basic operations in
numpy, in a more or less efficient fashion. But it would be very
valuable for arrays to have some kind of metadata that let them keep
track of which dimensions represented simple array storage and which
represented components of a linear algebra object. Such metadata could
make it possible to use, say, dot() as if it were a binary ufunc
taking two matrices. That is, you could feed it two arrays of
matrices, which it would broadcast to the same shape if necessary, and
then it would compute the elementwise matrix product.

The question I have is, what is the right mathematical model for
describing these
arrays-some-of-whose-dimensions-represent-linear-algebra-objects?


One idea is for each dimension to be flagged as one of replication,
vector, or covector. A column vector might then be a rank-1 vector
array, a row vector might be a rank-1 covector array, a linear
operator might be a rank-2 object with one covector and one vector
dimension, a bilinear form might be a rank-2 object with two covector
dimensions. Dimensions designed for holding repetitions would be
flagged as such, so that (for example) an image might be an array of
shape (N,M,3) of types (replication,replication,vector); then to
apply a color-conversion matrix one would simply use dot() (or * I
suppose). without too much concern for which index was which. The
problem is, while this formalism sounds good to me, with a background
in differential geometry, if you only ever work in spaces with a
canonical metric, the distinction between vector and covector may seem
peculiar and be unhelpful.

Implementing such a thing need not be too difficult: start with a new
subclass of ndarray which keeps a tuple of dimension types. Come up
with an adequate set of operations on them, and implement them in
terms of numpy's functions, taking advantage of the extra information
about each axis. A few operations that spring to mind:

* Addition: it doesn't make sense to add vectors and covectors; raise
an exception. Otherwise addition is always elementwise anyway. (How
hard should addition work to match up corresponding dimensions?)
* Multiplication: elementwise across repetition axes, it combines
vector axes with corresponding covector axes to get some kind of
generalized matrix product. (How is corresponding defined?)
* Division: mostly doesn't make sense unless you have an array of
scalars (I suppose it could compute matrix inverses?)
* Exponentiation: very limited (though I suppose matrix powers could
be implemented if the shapes are right)
* Change of basis: this one is tricky because not all dimensions need
come from the same vector space
* Broadcasting: the rules may become a bit byzantine...
* Dimension metadata fiddling

Is this a useful abstraction? It seems like one might run into trouble
when dealing with arrays whose dimensions represent vectors from
unrelated spaces.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] recarray fun

2008-04-30 Thread Anne Archibald
2008/5/1 Travis E. Oliphant [EMAIL PROTECTED]:
 Stéfan van der Walt wrote:
   2008/4/30 Christopher Barker [EMAIL PROTECTED]:
  
   Stéfan van der Walt wrote:
 That's the way, or just rgba_image.view(numpy.int32).
  
ah -- interestingly, I tried:
  
rgba_image.view(dtype=numpy.int32)
  
and got:
  
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: view() takes no keyword arguments
  
Since it is optional, shouldn't it be keyword argument?
  
  
   Thanks, fixed in r5115.
  
  
  This was too hasty.   I had considered this before.

  The problem with this is that the object can be either a type object or
  a data-type object.  You can use view to both re-cast a numpy array as
  another subtype or as another data-type.

  So, please revert the change until a better solution is posted.

If we're going to support keyword arguments, shouldn't those two
options be *different* keywords (dtype and ndarray_subclass, say)?
Then a single non-keyword argument tells numpy to guess which one you
wanted...

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] untenable matrix behavior in SVN

2008-04-29 Thread Anne Archibald
On 29/04/2008, Travis E. Oliphant [EMAIL PROTECTED] wrote:

  I'm quite persuaded now that a[i] should return a 1-d object for
  arrays.In addition to the places Chuck identified,  there are at
  least 2 other places where special code was written to work-around the
  expectation that item selection returns an object with reduced
  dimensionality (a special-cased .tolist for matrices and a special-cased
  getitem in the fancy indexing code).

  As the number of special-case work-arounds grows the more I'm convinced
  the conceptualization is wrong.   So, I now believe we should change the
  a[i] for matrices to return a 1-d array.

  The only down-side I see is that a[i] != a[i,:] for matrices.
  However,  matrix(a[i]) == a[i,:], and so I'm not sure there is really a
  problem, there.  I also don't think that whatever problem may actually
  be buried in the fact that type(a[i]) != type(a[i,:]) is worse than the
  problem that several pieces of NumPy code actually expect hierarchical
  container behavior of multi-dimensional sequences.

I am puzzled by this. What is the rationale for x[i,:] not being a 1-d
object? Does this not require many special-case bits of code as well?
What about a[i,...]? That is what I would use to make a hierarchical
bit of code, and I would be startled to find myself in an infinite
loop waiting for the dimension to become one.

  I don't think making the small change to have a[i] return 1-d arrays
  precludes us from that 1-d array being a bit more formalized in 1.2 as a
  RowVector should we choose that direction.   It does, however, fix
  several real bugs right now.

+1

What's more, I think we need to have a matrix-cleanliness test suite
that verifies that all basic numpy tools behave identically on
matrices and preserve matrixiness. But that's a big job, and for 1.2.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] untenable matrix behavior in SVN

2008-04-29 Thread Anne Archibald
On 29/04/2008, Keith Goodman [EMAIL PROTECTED] wrote:
 On Tue, Apr 29, 2008 at 11:21 AM, Alan G Isaac [EMAIL PROTECTED] wrote:
   On Tue, 29 Apr 2008, Keith Goodman apparently wrote:
 I often use x[i,:] and x[:,i] where x is a matrix and i is
 a scalar. I hope this continues to return a matrix.
  
1. Could you give an example of the circumstances of this
use?


 In my use i is most commonly an array (i = M.where(y.A)[0] where y is
  a nx1 matrix), sometimes a list, and in ipython when debugging or
  first writing the code, a scalar. It would seem odd to me if x[i,:]
  returned different types of objects based on the type of i:

  array index
  idx = M.where(y.A)[0] where y is a nx1 matrix
  x[dx,:] --  matrix

  list index
  idx = [0]
  x[idx,:] -- matrix?

  scalar index
  idx = 0
  x[idx,:] -- not matrix

It is actually pretty unreasonable to hope that

A[0]

and

A[[1,2,3]]
or
A[[True,False,True]]

should return objects of the same rank.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] accuracy issues with numpy arrays?

2008-04-29 Thread Anne Archibald
On 30/04/2008, eli bressert [EMAIL PROTECTED] wrote:
  I'm writing a quick script to import a fits (astronomy) image that has
  very low values for each pixel. Mostly on the order of 10^-9. I have
  written a python script that attempts to take low values and put them
  in integer format. I basically do this by taking the mean of the 1000
  lowest pixel values, excluding zeros, and dividing the rest of the
  image by that mean. Unfortunately, when I try this in practice, *all*
  of the values in the image are being treated as zeros. But, if I use a
  scipy.ndimage function, I get proper values. For example, I take the
  pixel that I know has the highest value and do

I think the bug is something else:

  import pyfits as p
  import scipy as s
  import scipy.ndimage as nd
  import numpy as n

  def flux2int(name):
 d  = p.getdata(name)
 x,y = n.shape(d)
 l = x*y
 arr1= n.array(d.reshape(x*y,1))
 temp = n.unique(arr1[0]) # This is where the bug starts. All values
  are treated as zeros. Hence only one value remains, zero.

Actually, since arr1 has shape (x*y, 1), arr1[0] has shape (1,), and
so it has only one entry:

In [82]: A = np.eye(3)

In [83]: A.reshape(9,1)
Out[83]:
array([[ 1.],
   [ 0.],
   [ 0.],
   [ 0.],
   [ 1.],
   [ 0.],
   [ 0.],
   [ 0.],
   [ 1.]])

In [84]: A.reshape(9,1)[0]
Out[84]: array([ 1.])

The python debugger is a good way to check this sort of thing out; if
you're using ipython, typing %pdb will start the debugger when an
exception is raised, at which point you can poke around in all your
local variables and evaluate expressions.

 arr1= arr1.sort()
 arr1= n.array(arr1)
 arr1= n.array(arr1[s.where(arr1 = temp)])

 val = n.mean(arr1[0:1000])

 d   = d*(1.0/val)
 d   = d.round()
 p.writeto(name[0,]+'fixed.fits',d,h)

Good luck,
Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] dot/tensordot limitations

2008-04-29 Thread Anne Archibald
Timothy Hochberg has proposed a generalization of the matrix mechanism
to support manipulating arrays of linear algebra objects. For example,
one might have an array of matrices one wants to apply to an array of
vectors, to yield an array of vectors:

In [88]: A = np.repeat(np.eye(3)[np.newaxis,...],2,axis=0)

In [89]: A
Out[89]:
array([[[ 1.,  0.,  0.],
[ 0.,  1.,  0.],
[ 0.,  0.,  1.]],

   [[ 1.,  0.,  0.],
[ 0.,  1.,  0.],
[ 0.,  0.,  1.]]])

In [90]: V = np.array([[1,0,0],[0,1,0]])

Currently, it is very clumsy to handle this kind of situation even
with arrays, keeping track of dimensions by hand. For example if one
wants to multiply A by V elementwise, one cannot simply use dot:

In [92]: np.dot(A,V.T)
Out[92]:
array([[[ 1.,  0.],
[ 0.,  1.],
[ 0.,  0.]],

   [[ 1.,  0.],
[ 0.,  1.],
[ 0.,  0.]]])


tensordot() suffers from the same problem. It arises because when you
combine two multiindexed objects there are a number of ways you can do
it:

A: Treat all indices as different and form all pairwise products
(outer product):
In [93]: np.multiply.outer(A,V).shape
Out[93]: (2, 3, 3, 2, 3)

B: Contract the outer product on a pair of indices:
In [98]: np.tensordot(A,V,axes=(-1,-1)).shape
Out[98]: (2, 3, 2)

C: Take the diagonal along a pair of indices:
In [111]: np.diagonal(np.multiply.outer(A,V),axis1=0,axis2=3).shape
Out[111]: (3, 3, 3, 2)

What we want in this case is a combination of B and C:
In [106]: np.diagonal(np.tensordot(A,V,axes=(-1,-1)),axis1=0,axis2=2).T.shape
Out[106]: (2, 3)

 but it cannot be done without constructing a larger array and pulling
out its diagonal.

If this seems like an exotic thing to want to do, suppose instead we
have two arrays of vectors, and we want to evaluate the array of dot
products. None of no.dot, np.tensordot, or np.inner produce the
results you want. You have to multiply elementwise and sum. (This also
precludes automatic use of BLAS, if indeed tensordot uses BLAS.)

Does it make sense to implement a generalized tensordot that can do
this, or is it the sort of thing one should code by hand every time?

Is there any way we can make it easier to do simple common operations
like take the elementwise matrix-vector product of A and V?


The more general issue, of making linear algebra natural by keeping
track of which indices are for elementwise operations, and which
should be used for dots (and how), is a whole other kettle of fish. I
think for that someone should think hard about writing a full-featured
multilinear algebra package (might as well keep track of coordinate
transformations too while one was at it) if this is intended.


Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] untenable matrix behavior in SVN

2008-04-29 Thread Anne Archibald
On 30/04/2008, Keith Goodman [EMAIL PROTECTED] wrote:
 On Tue, Apr 29, 2008 at 2:18 PM, Anne Archibald
  [EMAIL PROTECTED] wrote:
It is actually pretty unreasonable to hope that
  
A[0]
  
and
  
A[[1,2,3]]
or
A[[True,False,True]]
  
should return objects of the same rank.

 Why it unreasonable to hope that

  x[0,:]

  and

  x[0, [1,2,3]]

  or

  x[0, [True,False,True]]

  where x is a matrix, continue to return matrices?

Well, I will point out that your example is somewhat different from
mine; nobody is arguing that your three examples should return objects
of different ranks. There is some disagreement over whether

x[0,:]

should be a rank-1 or a rank-2 object, but x[0,[1,2,3]] and x[0,
[True, False, True]] should all have the same rank as x[0,:]. Nobody's
questioning that.

What I was pointing out is that x[0,0] should not have the same rank
as x[0,:] or x[0,[0]]. In this case it's obvious; x[0,0] should be a
scalar. But the same logic applies to
x[0,:]
versus
x[[0,1],:]
or even
x[[0],:].

For arrays, if x has rank 2, x[0,:] has rank 1. if L is a list of
indices, x[L,:] always has rank 2, no matter what is actually in L
(even if it's one element). It is perfectly reasonable that they
should yield objects of different ranks.

With matrices, we have the surprising result that x[0,:] is still a
two-dimensional object;  to get at its third element I have to specify
 x[0,:][0,2]. It seems to me that this is a peculiar hack designed to
ensure that the result still has * defined in a matrix way.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy release

2008-04-25 Thread Anne Archibald
On 25/04/2008, Stéfan van der Walt [EMAIL PROTECTED] wrote:
 2008/4/25 Alan G Isaac [EMAIL PROTECTED]:

   I must have misunderstood:
I thought the agreement was to
provisionally return a 1d array for x[0],
while we hashed through the other proposals.

 The agreement was:

  a) That x[0][0] should be equal to x[0,0] and
  b) That x[0,:] should be equal to x[0] (as for ndarrays)

  This breaks as little functionality as possible, at the cost of one
  (instead of two) API changes.

Hold on. There has definitely been some confusion here. This is not
what I thought I was suggesting, or what Alan thought he was
suggesting. I do not think special-casing matrices for which one
dimension happens to be one is a good idea at all, even temporarily.
This is the kind of thing that drives users crazy.

My suggested stopgap fix was to make x[0] return a 1D *array*; I feel
that this will result in less special-casing. In fact I wasn't aware
that anyone had proposed the fix you implemented. Can we change the
stopgap solution?

  We should now discuss the proposals on the table, choose a good one,
  and implement all the API changes necessary for 1.2 or 2.  It's a pity
  we have to change the API again, but the current situation is not
  tenable.

Yes, well, it really looks unlikely we will be able to agree on what
the correct solution is before 1.1, so I would like to have something
non-broken for that release.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Generating Bell Curves (was: Using normal() )

2008-04-25 Thread Anne Archibald
On 24/04/2008, Rich Shepard [EMAIL PROTECTED] wrote:
Thanks to several of you I produced test code using the normal density
  function, and it does not do what we need. Neither does the Gaussian
  function using fwhm that I've tried. The latter comes closer, but the ends
  do not reach y=0 when the inflection point is y=0.5.

So, let me ask the collective expertise here how to generate the curves
  that we need.

We need to generate bell-shaped curves given a midpoint, width (where y=0)
  and inflection point (by default, y=0.5) where y is [0.0, 1.0], and x is
  usually [0, 100], but can vary. Using the NumPy arange() function to produce
  the x values (e.g, arange(0, 100, 0.1)), I need a function that will produce
  the associated y values for a bell-shaped curve. These curves represent the
  membership functions for fuzzy term sets, and generally adjacent curves
  overlap where y=0.5. It would be a bonus to be able to adjust the skew and
  kurtosis of the curves, but the controlling data would be the
  center/midpoint and width, with defaults for inflection point, and other
  parameters.

I've been searching for quite some time without finding a solution that
  works as we need it to work.

First I should say, please don't call these bell curves! It is
confusing people, since that usually means specifically a Gaussian. In
fact it seems that you want something more usually called a sigmoid,
or just a curve with a particular shape. I would look at
http://en.wikipedia.org/wiki/Sigmoid_function
In particular, they point out that the integral of any smooth,
positive, bump-shaped function will be a sigmoid. So dreaming up an
appropriate bump-shaped function is one way to go.

Alternatively, tou can look at polynomial fitting - if you want, say,
a function that is 1 with derivative zero at 0, 0.5 with derivative -1
at x, and 0 with derivative 0 at 1, you can construct a unique
degree-6 polynomial that does exactly that; there's a new tool,
KroghInterpolator, in scipy svn that can do that for you.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] untenable matrix behavior in SVN

2008-04-25 Thread Anne Archibald
On 25/04/2008, Charles R Harris [EMAIL PROTECTED] wrote:


 On Fri, Apr 25, 2008 at 12:02 PM, Alan G Isaac [EMAIL PROTECTED] wrote:
  I think we have discovered that there is a basic conflict
  between two behaviors:
 
 x[0] == x[0,:]
 vs.
 
 x[0][0] == x[0,0]
 
  To my recollection, everyone has agree that the second
  behavior is desirable as a *basic expectation* about the
  behavior of 2d objects.  This implies that *eventually* we
  will have ``x[0][0] == x[0,0]``.  But then eventually
  we MUST eventually have x[0] != x[0,:].
 
 Choices:

 1) x[0] equiv x[0:1,:] -- current behavior
 2) x[0] equiv x[0,:] -- array behavior, gives x[0][0] == x[0,0]

 These are incompatible. I think 1) is more convenient in the matrix domain
 and should be kept, while 2) should be given up. What is the reason for
 wanting 2)? Maybe we could overload the () operator so that x(0) gives 2)?
 Or vice versa.

We are currently operating in a fog of abstraction, trying to decide
which operations are more natural or more in correspondence with our
imagined idea of a user. We're not getting anywhere. Let's start
writing up (perhaps on the matrix indexing wiki page) plausible use
cases for scalar indexing, and how they would look under each
proposal.

For example:

* Iterating over the rows of a matrix:

# array-return proposal; x is a scalar
for row in M:
if np.all(row==0):
raise ValueError
# exception-raising proposal
for i in xrange(M.shape[0]):
if np.all(M[i,:]==0):
raise ValueError

Other applications?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Movement of ma breaks matplotlib

2008-04-25 Thread Anne Archibald
Hi,

In the upcoming release of numpy. numpy.core.ma ceases to exist. One
must use numpy.ma (for the new interface) or numpy.oldnumeric.ma (for
the old interface). This has the unfortunate effect of breaking
matplotlib(which does from numpy.core.ma import *) - I cannot even
import pylab with a stock matplotlib (0.90.1) and an SVN numpy. Is
this an intended effect?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy release

2008-04-23 Thread Anne Archibald
On 23/04/2008, Alan Isaac [EMAIL PROTECTED] wrote:
 On Wed, 23 Apr 2008, Sebastian Haase wrote:
   What used to be referred to a the 1.1 version, that can
   break more stuff, to allow for a cleaner design, will now
   be 1.2

 So ... fixing x[0][0] for matrices should wait
  until 1.2.  Is that correct?

It seems to me that everyone agrees that the current situation is
broken, but there is considerable disagreement on what the correct fix
is. My inclination is to apply the minimal fix you suggest, with tests
and documentation, as a stopgap, and let the different factions sort
it out by discussion on the way to 1.2.

Objections?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] aligned matrix / ctypes

2008-04-23 Thread Anne Archibald
On 23/04/2008, Zachary Pincus [EMAIL PROTECTED] wrote:
 Hi,

  Thanks a ton for the advice, Robert! Taking an array slice (instead of
  trying to set up the strides, etc. myself) is a slick way of getting
  this result indeed.

It's worth mentioning that there was some discussion of adding an
allocator for aligned arrays. It got sidetracked into a discussion of
SSE support for numpy, but there are all sorts of reasons to want
aligned arrays in numpy, as this post demonstrates. Seeing as it's
just a few lines worth of pure python, is anyone interested in writing
an aligned_empty() for numpy?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Truth value of an array

2008-04-18 Thread Anne Archibald
On 18/04/2008, Robert Kern [EMAIL PROTECTED] wrote:
 On Fri, Apr 18, 2008 at 3:33 PM, Joe Harrington [EMAIL PROTECTED] wrote:
   For that matter, is there a reason logical operations don't work on
arrays other than booleans?  What about:

 The keywords and, or, and not only work on bool objects; Python
  tries to convert the operands using bool(). bool(some_array) raises
  the exception just as we've been discussing.

This is a bit misleading, as the actual value is returned:

In [167]:  or A
Out[167]: 'A'

In [168]: A and B
Out[168]: 'B'

In fact the problem is that and and or are short-circuiting, which
means that return X or Y is equivalent to something like:
if X:
return X
elif Y:
return Y
else:
return False

These are handled specially by syntax so that, for example, you can do
False and 1/0 and not raise a ZeroDivisionError. So and and or
aren't even really operators, and it's not just impossible to change
them but unlikely to ever become possible. Just cast your arrays to
booleans if you want to do boolean operations on them.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


<    1   2   3   4   5   >