Re: [Numpy-discussion] PyData Barcelona this May

2017-03-21 Thread Marten van Kerkwijk
"Taking numpy in stride, and the essential role of 0" ;-)

-- Marten
NumPy-Discussion mailing list

Re: [Numpy-discussion] Move docs to Github?

2017-03-15 Thread Marten van Kerkwijk
Astropy uses readthedocs quite happily (auto-updates on merges to master too).
-- Marten
NumPy-Discussion mailing list

Re: [Numpy-discussion] __numpy_ufunc__

2017-02-22 Thread Marten van Kerkwijk
HI Stephan,

Indeed, `__array_ufunc__` is None would be for classes that interact
with arrays only as if they were any other numeric type, and thus have
no use for ufuncs, but may need normal operations (astropy's `Unit`
class is a reasonably good example).

Your example also makes clear that, indeed, setting __array__ or
__array_ufunc__ to None implies different things, so concretely here
the proposal is that if `__array_ufunc__` is None, ndarray methods
will return `NotImplemented`.

As an aside, I think that if we do not have something like that, we'll
be stuck with supporting `__array_priority__`. (Which is OK by me too,
but it might as well be a conscious choice.)

All the best,

NumPy-Discussion mailing list

Re: [Numpy-discussion] __numpy_ufunc__

2017-02-22 Thread Marten van Kerkwijk
Hi All,

I'd very much like to get `__array_ufunc__` in, and am willing to do
some work, but fear we need to get past the last sticking point. As I
noted in Chuck's PR [1], in python 3.6 there is now an explicit
language change [2], which I think is relevant:
It is now possible to set a special method to None to indicate that
the corresponding operation is not available. For example, if a class
sets __iter__() to None, the class is not iterable.

It seems to me entirely logical (but then it would, I suggested it
before...) that we allow opting out by setting `__array_ufunc__` to
None; in that case, binops return NotImplemented and ufuncs raise
errors. (In addtion, or alternatively, one could allow setting
`__array__` to None, which would generally disable something to be
turned into an array object).

But I should note that I much prefer to get something in over wait yet
another round! In astropy, there is now more and more clamouring to
offer options for pure ndarray functions where quantities are more
logical because quantities are twice as slow -- this would instantly
be solved with __array_ufunc__... If we can decide on this, then I'd
gladly help with remaining issues (e.g., the `ndarray.__array_ufunc__`
method, so super can be used).

All the best,


NumPy-Discussion mailing list

Re: [Numpy-discussion] Could we simplify backporting?

2017-02-22 Thread Marten van Kerkwijk
Hi Ralf,

Yes, good to think about other policies. For astropy, we do the
decision by labelling with the bug-fix branch (with a policy that it
really should fix a bug), and inserting text in that release's bug-fix
notes (we really should automate that part...). Then, backporting is
done shortly before the bug-fix release and, as far as I can tell (not
having done it myself), outside of github. In rare cases with
hard-to-resolve merge conflicts, the original PR author gets a note
asking for help.

As for a travis test: here I was mostly thinking of an allowed-to-fail
test that would at least alert one if backporting was going to be an
issue.  I think travis runs again once one merges, correct? If so, on
that merge it could, in principle, do the backport too (if given
enough permission, of course; I'm not sure at all I'd want that, just
pointing out the possibility! E.g., it might trigger on a message in
the merge commit.).

All the best,

NumPy-Discussion mailing list

[Numpy-discussion] Could we simplify backporting?

2017-02-21 Thread Marten van Kerkwijk
Hi All,

In gh-8594, a question came up how to mark things that should be
backported and Chuck commented [1]:

> Our backport policy is still somewhat ad hoc, especially as I the only one 
> who has been doing release. What I currently do is set the milestone to the 
> earlier version, so I will find the PR when looking for backports, then do a 
> backport, label it as such, set the milestone on the backported version, and 
> remove the milestone from the original. I'm not completely happy with the 
> process, so if you have better ideas I'd like to hear them. One option I've 
> considered is a `backported` label in addition to the `backport` label, then 
> use the latter for things to be backported.

It seems that continuing to set the milestone to a bug-release version
if required was a good idea; it is little effort to anyone and keeps
things clear. For the rest, might it be possible to make things more
automated? E.g., might it be possible to have some travis magic that
does a trial merge & test? Could one somehow merge to multiple
branches at the same time?

I have no idea myself really how these things work, but maybe some of you do!

All the best,

NumPy-Discussion mailing list

Re: [Numpy-discussion] ANN: NumExpr3 Alpha

2017-02-19 Thread Marten van Kerkwijk
Hi All,

Just a side note that at a smaller scale some of the benefits of
numexpr are coming to numpy: Julian Taylor has been working on
identifying temporary arrays in Julian also commented
( that
with PEP 523 in python 3.6, this should indeed become a lot easier.

All the best,

NumPy-Discussion mailing list

Re: [Numpy-discussion] Proposal to support __format__

2017-02-15 Thread Marten van Kerkwijk
Hi Gustav,

This is great!  A few quick comments (mostly echo-ing Stephan's).

1. You basically have a NEP already! Making a PR from it allows to
give line-by-line comments, so would help!

2. Don't worry about supporting python2 specifics; just try to ensure
it doesn't break; I would not say more about it!

3. On `set_printoptions` -- ideally, it will become possible to use
this as a context (i.e., `with set_printoption(...)`). It might make
sense to have an `override_format` keyword argument to it.

4. Otherwise, my main suggestion is to start small with the more
obvious ones, and not worry too much about format validation, but
rather about getting the simple ones to work well (e.g., for an object
array, just apply the format given; if it doesn't work, it will error
out on its own, which is OK).

5. One bit of detail: the "g" one does confuse me.

All the best,

NumPy-Discussion mailing list

Re: [Numpy-discussion] Deprecating matrices.

2017-01-07 Thread Marten van Kerkwijk
Hi All,

It seems there are two steps that can be taken now and are needed no
matter what:

1. Add numpy documentation describing the preferred way to handle
matrices, extolling the virtues of @, and move np.matrix documentation
to a deprecated section

2. Start on a new `sparse` class that is based on regular arrays (and
uses `__array_func__` instead of prepare/wrap?).

All the best,

NumPy-Discussion mailing list

Re: [Numpy-discussion] subclassing ndarray and keeping same ufunc behavior

2016-11-15 Thread Marten van Kerkwijk
Hi Stuart,

It certainly seems correct behaviour to return the subclass you
created: after all, you might want to keep the information on
`columns` (e.g., consider doing nanmin along a given axis). Indeed, we
certainly want to keep the unit in astropy's Quantity (which also is a
subclass of ndarray).

On the shape: shouldn't that be print(np.nanmin(r).shape)??

Overall, I think it is worth considering very carefully what exactly
you try to accomplish; if different elements along a given axis have
different meaning, I'm not sure it makes all that much sense to treat
them as a single array (e.g., np.sin might be useful for one column,
not not another). Even if pandas is slower, the advantage in clarity
of what is happening might well be more important in the long run.

All the best,


p.s. nanmin is not a ufunc; you can find it in numpy/lib/
NumPy-Discussion mailing list

Re: [Numpy-discussion] __numpy_ufunc__

2016-10-31 Thread Marten van Kerkwijk
Hi Chuck,

> We were pretty close. IIRC, the outstanding issue was some sort of override.

Correct. With a general sentiment of those downstream that it would be
great to merge in any form, as it will be really helpful! (Generic
speedup of factor of 2 for computationally expensive ufuncs (sin, cos,
etc.) that needs scaling in Quantity...)

> At the developer meeting at scipy 2015 it was agreed that it would be easy
> to finish things up under the rubric "make Pauli happy".

That would certainly make me happy too!

Other items that were brought up (trying to summarize from issues
linked above, and links therein):

1. Remove index argument
2. Out always a tuple
3. Let ndarray have a __numpy_ufunc__ stub, so one can super it.

Here, the first item implied a possible name change (to
__array_ufunc__); if that's too troublesome, I don't think it really
hurts to have the argument, though it is somewhat "unclean" for the
case that only the output has __numpy_ufunc__.

All the best,

NumPy-Discussion mailing list

Re: [Numpy-discussion] __numpy_ufunc__

2016-10-31 Thread Marten van Kerkwijk
Hi Chuck,

I've revived my Quantity PRs that use __numpy_ufunc__ but is it
correct that at present in *dev, one cannot use it?

All the best,

NumPy-Discussion mailing list

Re: [Numpy-discussion] __numpy_ufunc__

2016-10-30 Thread Marten van Kerkwijk
> The __numpy_ufunc__ functionality is the last bit I want for 1.12.0, the
> rest of the remaining changes I can kick forward to 1.13.0. I will start
> taking a look tomorrow, probably starting with Nathaniel's work.

Great! I'll revive the Quantity PRs that implement __numpy_ufunc__!

-- Marten
NumPy-Discussion mailing list

Re: [Numpy-discussion] padding options for diff

2016-10-28 Thread Marten van Kerkwijk
Matthew has made what looks like a very nice implementation of padding
in np.diff in I raised two
general questions about desired behaviour there that Matthew thought
we should put out on the mailiing list as well. This indeed seemed a
good opportunity to get feedback, so herewith a copy of

-- Marten

1. I'm not sure that treating a 1-d array as something that will just
extend the result along `axis` is a good idea, as it breaks standard
broadcasting rules. E.g., consider
np.diff([[1, 2], [4, 8]], to_begin=[1, 4])
# with your PR:
array([[1, 4, 1],
   [1, 4, 4]])
# but from regular broadcasting I would expect
array([[1, 1],
   [4, 4]])
# i.e., the same as if I did to_begin=[[1, 4]]
I think it is slightly odd to break the broadcasting expectation here,
especially since the regular use case surely is just to add a single
element so that one keeps the original shape. The advantage of
assuming this is that you do not have to do *any* array shaping of
`to_begin` and `to_end` (which perhaps also suggests it is the right
thing to do).

2. As I mentioned above, I think it may be worth thinking through a
little what to do with higher order differences, at least for
`to_begin='first'`. If the goal is to ensure that with that option, it
becomes the inverse of `cumsum`, then I think for higher order one
should add multiple elements in front, i.e., for that case, the
recursive call should be
return np.diff(np.diff(a, to_begin='first'), n-1, to_begin='first')
NumPy-Discussion mailing list

Re: [Numpy-discussion] assert_allclose equal_nan default value.

2016-10-20 Thread Marten van Kerkwijk
Good, that means I can revert some changes to astropy, which made the
tests less readable.
-- Marten
NumPy-Discussion mailing list

Re: [Numpy-discussion] Integers to negative integer powers, time for a decision.

2016-10-12 Thread Marten van Kerkwijk
I still strongly favour ending up at int**int -> float, and like
Peter's suggestion of raising a general warning rather than an
exception for negative powers.  -- Marten
NumPy-Discussion mailing list

Re: [Numpy-discussion] automatically avoiding temporary arrays

2016-10-03 Thread Marten van Kerkwijk
Note that numpy does store some larger arrays already, in the fft
module. (In fact, this was a cache of unlimited size until #7686.) It
might not be bad if the same cache were used more generally.

That said, if newer versions of python are offering ways of doing this
better, maybe that is the best way forward.

-- Marten
NumPy-Discussion mailing list

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-11 Thread Marten van Kerkwijk
There remains the option to just let subclasses deal with new ndarray
features...  Certainly, for `Quantity`, I'll quite happily do that.
And if it alllows the ndarray code to remain simple and efficient, it
is probably the best solution.  -- Marten
NumPy-Discussion mailing list

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-06 Thread Marten van Kerkwijk
In a separate message, since perhaps a little less looney: yet another
option would be work by analogy with np.ix_ and define pre-dispatch
index preparation routines np.ox_ and np.vx_ (say), which would be
used as in:
array[np.ox_[:, 10]]   -- or -- array[np.vx_[:, 10]]
This could work if those functions each return something appropriate
for the legacy indexer, or, if that is not possible, a specific
subclass of tuple as a marker that gets interpreted further up.

In the end, though, probably also too complicated. It may remain best
to simply implement the new methods instead and keep it at that!

-- Marten
NumPy-Discussion mailing list

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-06 Thread Marten van Kerkwijk
I'd love to solve it with `__getitem__`... Since most subclasses will
have that defined that with just a single argument, calling it from
`oindex` with an extra mode argument set will properly fail, so good
in that sense (although one better ensure a useful error message...).

Another option would be to store the mode as an additional part in the
index tuple (perhaps as a dict), i.e., in python form do something
def __getitem__(self, index):
if isinstance(index, tuple) and isinstance(index[-1], dict):
*index, kwargs = index
kwargs = {}
 return self._internal_indexer(index, **kwargs)
This way, if all a subclass does is to call `super(SubClass,
self).__getitem__(index)` (as does `Quantity`; it does not look at the
index at all), it would work automagically.  Indeed, one could them
decide the type of index even by regular slicing in the following way
array[:, 10, {'mode': 'vector'}]
which one could of course have a special token for (like `np.newaxis`
for `None`), so that it would be something like
array[:, 10, np.vector_index]

However, looking at the above, I fear this is too baroque even by my standards!

-- Marten
NumPy-Discussion mailing list

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-05 Thread Marten van Kerkwijk
p.s. Just to be clear: personally, I think we should have neither
`__numpy_getitem__` nor a mixin; we should just get the quite
wonderful new indexing methods!
NumPy-Discussion mailing list

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-05 Thread Marten van Kerkwijk
Hi Nathan,

The question originally posed is whether `ndarray` should provide that
single method as a convenience already, even though it doesn't
actually use it itself. Do you think that is useful, i.e., a big
advantage over overwriting the new oindex, vindex, and another that I

My own feeling is that while it is good to provide some hooks for
subclasses (__array_prepare__, wrap, finalize, numpy_ufunc), this one
is too fine-grained and the benefits do not outweigh the cost,
especially since it could easily be done with a mixin (unlike those
other cases, which are not used to cover ndarray methods, but rather
numpy functions, i.e., they provide subclasses with hooks into those
functions, which no mixin could possibly do).

All the best,

NumPy-Discussion mailing list

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-05 Thread Marten van Kerkwijk
Actually, on those names: an alternative to your proposal would be to
introduce only one new method which can do all types of indexing,
depending on a keyword argument, i.e., something like
def getitem(self, item, mode='outer'):

-- Marten
NumPy-Discussion mailing list

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-05 Thread Marten van Kerkwijk
Hi Sebastian,

It would seem to me that any subclass has to keep up to date with new
features in ndarray, and while I think ndarray has a responsibility
not to break backward compatibility, I do not think it has to protect
against new features possibly not working as expected in subclasses.
In particular, I think it is overly complicated (and an unnecessary
maintenance burden) to error out if a subclass has `__getitem__`
overwritten, but not `oindex`.

For somewhat similar reasons, I'm not too keen on a new
`__numpy_getitem__` method; I realise it might reduce complexity for
some ndarray subclasses eventually, but it also is an additional
maintenance burden. If you really think it is useful, I think it might
be more helpful to define a new mixin class which provides a version
of all indexing methods that just call `__numpy_getitem__` if that is
provided by the class that uses the mixin. I would *not* put it in
`ndarray` proper.

Indeed, the above might even be handier for subclasses, since they can
choose, if they wish, to implement a similar mixin for older numpy
versions, so that all the numpy version stuff can be moved to a single
location. (E.g., I can imagine doing the same for `__numpy_ufunc__`.)

Overall, my sense would be to keep your PR to just implementing the
various new index methods (which are great -- I still don't really
like the names, but sadly don't have better suggestions...).

But it might be good if others pipe in here too, in particular those
maintaining ndarray subclasses!

All the best,

NumPy-Discussion mailing list

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-05 Thread Marten van Kerkwijk
Hi Sebastian,

Indeed, having the scalar pass through `__array_wrap__` would have
been useful (_finalize__ is too late, since one cannot change the
class any more, just set attributes).  But that is water under the
bridge, since we're stuck with people not expecting that.

I think the slightly larger question, but one somewhat orthogonal to
your suggestion of a new dundermethod, is whether one cannot avoid
more such methods by the new indexing routines returning array scalars
instead of regular ones.

Obviously, though, this has larger scope, as it might be part of the
merging of the now partially separate code paths for scalar and array
arithmetic, etc.

All the best,

NumPy-Discussion mailing list

Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-04 Thread Marten van Kerkwijk
Hi Sebastian,

I haven't given this as much thought as it deserves, but thought I
would comment from the astropy perspective, where we both have direct
subclasses of `ndarray` (`Quantity`, `Column`, `MaskedColumn`) and
classes that store their data internally as ndarray (subclass)
instances (`Time`, `SkyCoord`, ...).

One comment would be that if one were to introduce a special method,
one should perhaps think a bit more broadly, and capture more than the
indexing methods with it. I wonder about this because for the
array-holding classes mentioned above, we initially just had
`__getitem__`, which got the relevant items from the underlying
arrays, and then constructed a new instance with those. But recently
we realised that methods like `reshape`, `transpose`, etc., require
essentially the same steps, and so we constructed a new
`ShapedLikeNDArray` mixin, which provides all of those [1] as long as
one defines a single `_apply` method. (Indeed, it turns out that the
same technique works without any real change for some numpy functions
such as `np.broadcast_to`.)

That said, in the actual ndarray subclasses, we have not found a need
to overwrite any of the reshaping methods, since those methods are all
handled OK via `__array_finalize__`. We do overwrite `__getitem__`
(and `item`) as we need to take care of scalars. And we would
obviously have to overwrite `oindex`, etc., as well, for the same
reason, so in that respect a common method might be useful.

However, perhaps it is worth considering that the only reason we need
to overwrite them in the first place, unlike what is the case for all
the shape-changing methods, is that scalar output does not get put
through `__array_finalize__`. Might it be an idea to have the new
indexing methods return array scalars instead of normal ones so we can
get rid of this?

All the best,


NumPy-Discussion mailing list

Re: [Numpy-discussion] Views and Increments

2016-08-08 Thread Marten van Kerkwijk
Hi Anakim,

The difference is really in the code path that gets taken: in the first
case, you go through `a.__getitem__(np.array([1,6,5])`, in the second
through `a.__setitem__(...)`. The increments would not work if you added an
extra indexing to it, as in:
a[np.array([1,6,5])][:] += 1

​(which would do `a.__getitem__(...).__setitem__(slice(None))`)

Hope this clarifies it,

NumPy-Discussion mailing list

Re: [Numpy-discussion] Python 3 dict support (issue #5718)

2016-07-21 Thread Marten van Kerkwijk
I know it is slightly obnoxious to hold the "making a suggestion is to
volunteer for it" -- but usually a PR to the docs is best made by someone
who is trying to understand it rather than someone who already knows
-- Marten
NumPy-Discussion mailing list

Re: [Numpy-discussion] Added atleast_nd, request for clarification/cleanup of atleast_3d

2016-07-06 Thread Marten van Kerkwijk
Hi All,

I'm with Nathaniel here, in that I don't really see the point of these
routines in the first place: broadcasting takes care of many of the initial
use cases one might think of, and others are generally not all that well
served by them: the examples from scipy to me do not really support
`at_least?d`, but rather suggest that little thought has been put into
higher-dimensional objects which should be treated as stacks of row or
column vectors. My sense is that we're better off developing the direction
started with `matmul`, perhaps adding `matvecmul` etc.

More to the point of the initial inquiry: what is the advantage of having a
general `np.at_leastnd` routine over doing
np.array(a, copy=False, ndim=n)
or, for a list of inputs,
[np.array(a, copy=False, ndim=n) for a in input_list]

All the best,

NumPy-Discussion mailing list

Re: [Numpy-discussion] Support of '@='?

2016-06-22 Thread Marten van Kerkwijk
> Just if you are curious why it is an error at the moment. We can't have
> it be filled in by python to be not in-place (`M = M @ P` meaning), but
> copying over the result is a bit annoying and nobody was quite sure
> about it, so it was delayed.

The problem with using out in-place is clear from trying `np.matmul(a, a,
In [487]: a
array([[ 1.   ,  0.   ,  0.   ],
   [ 0.   ,  0.8660254,  0.5  ],
   [ 0.   , -0.5  ,  0.8660254]])

In [488]: np.matmul(a, a)
array([[ 1.   ,  0.   ,  0.   ],
   [ 0.   ,  0.5  ,  0.8660254],
   [ 0.   , -0.8660254,  0.5  ]])

In [489]: np.matmul(a, a, out=a)
array([[ 0.,  0.,  0.],
   [ 0.,  0.,  0.],
   [ 0.,  0.,  0.]])

It would seem hard to avoid doing the copying (though obviously one should
iterate over higher dimensiones, ie., temp.shape = M.shape[-2:]). Not
dissimilar from cumsum etc which are also not true ufuncs (but where things
can be made to work by ensuring operations are doing in the right order).

-- Marten
NumPy-Discussion mailing list

Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-13 Thread Marten van Kerkwijk
Hi All,

​I think we're getting a little off the rails, perhaps because two
questions are being conflated:

1. What in principle is the best return type for int ** int (which Josef I
think most properly rephrased as whether `**` should be thought of as a
float operator, like `/` in python3 and `sqrt` etc.);

2. Whether one is willing to possibly break code by implementing this.

My sense is that most discussion is about (1), where a majority may well
agree the answer is float, instead of about (2), where it ends up boiling
down to a judgment call of "eternal small pain" or "possible short-time big
pain but consistency from now on".

Perhaps I can introduce an alternative (likely shot down immediately...).
For this, note that for division at least, numpy follows python closely, so
that one has the following in python2:
In [2]: np.arange(10) / 2
Out[2]: array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4])

In [3]: from __future__ import division

In [4]: np.arange(10) / 2
Out[4]: array([ 0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5])
Since negative exponents are really just  1 over the positive one, could we
use the same logic for **? I.e., let what type is returned by int1 ** int2
be the same as that returned by int1 / int2? If we then also ensure that
for integer output type, int1 ** -int2 returns 1 // (int1 ** int2), we have
well-defined rules all around, so there would be no need for raising
zero-division error.

All the best,

NumPy-Discussion mailing list

Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-10 Thread Marten van Kerkwijk
I do think one of the main arguments for returning float remains the
analogy with division. I don't know about the rest of you, but it has been
such a relief not to have to tell students any more "you should add a ".",
otherwise it does integer division". For most purposes, it simply shouldn't
matter whether one types an integer or a float; if it does, then one has to
think about it, and it seems fine for that relatively specialized case to
have to use a specialized function.  -- Marten
NumPy-Discussion mailing list

Re: [Numpy-discussion] ENH: compute many inner products quickly

2016-06-06 Thread Marten van Kerkwijk
There I was thinking vector-vector inner product was in fact covered by
`np.inner`. Yikes, half inner, half outer.

As for names, I think `matvecmul` and `vecmul` do seem quite OK (probably
need `vecmatmul` as well, which does the same as `matmul` would for 1-D
first argument).

But as other suggestions, keeping the `dot` one could think of
`vec_dot_vec` and `mat_dot_vec`, etc. More obscure but shorter would be to
use the equivalent `einsum` notation: `i_i`, `ij_j`, `i_ij`, `ij_jk`.

-- Marten
NumPy-Discussion mailing list

Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-06 Thread Marten van Kerkwijk
Hi Chuck,

I consider either proposal an improvement, but among the two I favour
returning float for `**`, because, like for `/`, it ensures one gets
closest to the (mathematically) true answer in most cases, and makes
duck-typing that much easier -- I'd like to be able to do x** y without
having to worry whether x and y are python scalars or numpy arrays of
certain type.

I do agree with Nathaniel that it would be good to check what actually
breaks. Certainly, if anybody is up to making a PR that implements either
suggestion, I'd gladly check whether it breaks anything in astropy.

I  should add that I have no idea how to assuage the fear that new code
would break with old versions of numpy, but on the other hand, I don't know
its vailidity either, as it seems one either develops larger projects  for
multiple versions and tests, or writes more scripty things for whatever the
current versions are. Certainly, by this argument I better not start using
the new `@` operator!

I do think the argument that for division it was easier because there was
`//` already available is a red herring: here one can use `np.power(a, b,
dtype=...)` if one really needs to.

All the best,

NumPy-Discussion mailing list

Re: [Numpy-discussion] Changing FFT cache to a bounded LRU cache

2016-05-31 Thread Marten van Kerkwijk
> A lot of us use NumPy linked with MKL or Accelerate, both of which have
> some really nifty FFTs. And the license issue is hardly any worse than
> linking with them for BLAS and LAPACK, which we do anyway. We could extend
> numpy.fft to use MKL or Accelerate when they are available.

That would be wonderful! Especially if one can also remove the automatic
cast to double (as I'm analysing 2-bit VLBI data, getting to 64 bit is

-- Marten
NumPy-Discussion mailing list

Re: [Numpy-discussion] Changing FFT cache to a bounded LRU cache

2016-05-29 Thread Marten van Kerkwijk

I did a few simple timing tests (see comment in PR), which suggests it is
hardly worth having the cache. Indeed, if one really worries about speed,
one should probably use pyFFTW (scipy.fft is a bit faster too, but at least
for me the way real FFT values are stored is just too inconvenient). So, my
suggestion would be to do away with the cache altogether.

If we do keep it, I think the approach in the PR is nice, but I would
advocate setting both a size and number limit (e.g., by default no more
than 8 entries or so, which should cover most repetitive use cases).

All the best,


p.s. I do like having a quick fft routine in numpy. My main gripe is that
it always casts to float64/complex128 rather than sticking with the input
dtype. Hope to get around to making  a PR for that...
NumPy-Discussion mailing list

Re: [Numpy-discussion] Integers to integer powers

2016-05-24 Thread Marten van Kerkwijk
The error on int ** (-int) is OK with me too (even though I prefer just
returning floats).
Having a `floor_pow` function may still be good with this solution too.
-- Marten
NumPy-Discussion mailing list

Re: [Numpy-discussion] Behavior of .reduceat()

2016-05-22 Thread Marten van Kerkwijk
Hi Jaime,

Very belated reply, but only with the semester over I seem to have regained
some time to think.

The behaviour of reduceat always has seemed a bit odd to me, logical for
dividing up an array into irregular but contiguous pieces, but illogical
for more random ones (where one effectively passes in pairs of points, only
to remove the unwanted calculations after the fact by slicing with [::2];
indeed, the very first example in the documentation does exactly this [1]).
I'm not sure any of your proposals helps all that much for the latter case,
while it risks breaking existing code in unexpected ways.

For me, for irregular pieces, it would be much nicer to simply pass in
pairs of points. I think this can be quite easily done in the current API,
by expanding it to recognize multidimensional index arrays (with last
dimension of 2; maybe 3 for step as well?). These numbers would just be the
equivalent of start, end (and step?) of `slice`, so I think one can allow
any integer with negative values having the usual meaning and clipping at 0
and length. So, specifically, the first example in the documentation would
change from:

np.add.reduceat(np.arange(8),[0,4, 1,5, 2,6, 3,7])[::2]


np.add.reduceat(np.arange(8),[(0, 4), (1, 5), (2, 6), (3,7)])

(Or an equivalent ndarray. Note how horrid the example is: really, you'd
want 4,8 as a pair too, but in the current API, you'd get that by adding a

What do you think? Would this also be easy to implement?

All the best,


NumPy-Discussion mailing list

Re: [Numpy-discussion] Integers to integer powers

2016-05-20 Thread Marten van Kerkwijk
Hi All,

As a final desired state, always returning float seems the best idea. It
seems quite similar to division in this regard, where integer division
works for some values, but not for all. This means not being quite
consistent with python, but as Nathan pointed out, one cannot have
value-dependent dtype's for arrays (and scalars should indeed behave the
same way).

If so, then like for division, in-place power should raise a `TypeError`.
Obviously, one could have a specific function for integers (like `//` for
division) for cases where it is really needed.

Now, how to get there...  Maybe we can learn from division? At least, I
guess at some point `np.divide` became equivalent to `np.true_divide`?

All the best,

NumPy-Discussion mailing list

Re: [Numpy-discussion] Numpy 1.11.0b2 released

2016-01-31 Thread Marten van Kerkwijk
Hi Julian,

While the numpy 1.10 situation was bad, I do want to clarify that the
problems we had in astropy were a consequence of *good* changes in
`recarray`, which solved many problems, but also broke the work-arounds
that had been created in `` quite a long time ago (possibly
before astropy became as good as it tries to be now at moving issues
upstream and perhaps before numpy had become as responsive to what happens
downstream as it is now; I think it is fair to say many project's attitude
to testing has changed rather drastically in the last decade!).

I do agree, though, that it just goes to show one has to try to be careful,
and like Nathaniel's suggestion of automatic testing with pre-releases -- I
just asked on our astropy-dev list whether we can implement it.

All the best,

NumPy-Discussion mailing list

Re: [Numpy-discussion] Behavior of np.random.uniform

2016-01-19 Thread Marten van Kerkwijk
For what it is worth, the current behaviour seems the most logical to me,
i.e., that the first limit is always the one that is included in the
interval, and the second is not. -- Marten
NumPy-Discussion mailing list

Re: [Numpy-discussion] Get rid of special scalar arithmetic

2016-01-13 Thread Marten van Kerkwijk
Just thought I would add here a general comment I made in the thread:
replacing scalars everywhere with array scalars (i.e., ndim=0) would be
great also from the perspective of ndarray subclasses; as is, it is quite
annoying to have to special-case, e.g., getting a single subclass element,
and rewrapping the scalar in the subclass.
-- Marten
NumPy-Discussion mailing list

[Numpy-discussion] Subclass awareness of outer, inner, dot, ...

2016-01-06 Thread Marten van Kerkwijk
Hi All,

Having just had a look at functions where astropy's quantities are silently
converted to ndarray, I came across some that, in principle, are easy to
solve, yet for which, as always, there is a worry about backward
compatibility. So, the question arises what to do.

As a specific example, consider `np.outer` which is defined in ``
and boils down to

multiply(a.ravel()[:, newaxis], b.ravel()[newaxis,:], out)

Since multiply is fine with subclasses, all should be well, except that
before it is called, there are `a = asarray(a)` and `b = asarray(b)`
statements, which mean subclasses are lost.

Obviously, in this case, a simple fix would be to use `asanyarray` instead,
but ideally, similar routines behave similarly and hence one would also
want to change `np.inner` and `` (and perhaps more)... Hence, before
doing anything, I thought I would ask whether:

1. Such changes are or are not too risky for backward compatibility;
2. If so, whether, given that `` can be caught using
`__numpy_ufunc__`, one should perhaps allow functions such as `outer` also
to be passed through that?

All the best,

NumPy-Discussion mailing list

Re: [Numpy-discussion] 1.10.3 release tomorrow, 1.11.x branch this month.

2016-01-05 Thread Marten van Kerkwijk
Hi Chuck, others,

A propos __numpy_ufunc__, what is the current status? Is it still the
undetermined result of the monster-thread ( -- just found it again by
sorting by number of comments...)?

As noted by Stephan and myself when the decision was made to remove it from
1.10, for external libraries it would be really wonderful to have *any*
version of __numpy_ufunc__ in 1.11, as it provides great beneifts (instant
factor 2 improvement in speed for astropy quantities...). In the end, the
proposals were not that different, and, really, what is in current master
is quite good.

All the best,

NumPy-Discussion mailing list

Re: [Numpy-discussion] Fast vectorized arithmetic with ~32 significant digits under Numpy

2015-12-12 Thread Marten van Kerkwijk
Hi All,

astropy `Time` indeed using two doubles internally, but is very limited in
the operations it allows: essentially only addition/subtraction, and
multiplication with/division by a normal double.

It would be great to have better support within numpy; it is a pity to have
a float128 type that does not provide the full associated precision.

All the best,


On Sat, Dec 12, 2015 at 1:02 PM, Sturla Molden 

> "Thomas Baruchel"  wrote:
> > While this is obviously the most relevant answer for many users because
> > it will allow them to use Numpy arrays exactly
> > as they would have used them with native types, the wrong thing is that
> > from some point of view "true" vectorization
> > will be lost.
> What does "true vectorization" mean anyway?
> Sturla
> ___
> NumPy-Discussion mailing list
NumPy-Discussion mailing list

Re: [Numpy-discussion] deprecate fromstring() for text reading?

2015-10-22 Thread Marten van Kerkwijk
I think it would be good to keep the usage to read binary data at least. Or
is there a good alternative to `np.fromstring(, dtype=...)`?  --

On Thu, Oct 22, 2015 at 1:03 PM, Chris Barker  wrote:

> There was just a question about a bug/issue with scipy.fromstring (which
> is numpy.fromstring) when used to read integers from a text file.
> fromstring() is bugging and inflexible for reading text files -- and it is
> a very, very ugly mess of code. I dug into it a while back, and gave up --
> just to much of a mess!
> So we really should completely re-implement it, or deprecate it. I doubt
> anyone is going to do a big refactor, so that means deprecating it.
> Also -- if we do want a fast read numbers from text files function (which
> would be nice, actually), it really should get a new name anyway.
> (and the hopefully coming new dtype system would make it easier to write
> cleanly)
> I'm not sure what deprecating something means, though -- have it raise a
> deprecation warning in the next version?
> -CHB
> --
> Christopher Barker, Ph.D.
> Oceanographer
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
> ___
> NumPy-Discussion mailing list
NumPy-Discussion mailing list

Re: [Numpy-discussion] Making datetime64 timezone naive

2015-10-13 Thread Marten van Kerkwijk
> > However, numpy datetime is optimized for compact storage and fast
> computation of absolute deltas (actual hours, minutes, seconds... not
> calendar units like "the next day" ).
> Except that ironically it actually can't compute absolute deltas
> accurately with one second resolution, because it does the POSIX time thing
> of pretending that all UTC days have the same number of seconds, even
> though this is not true (leap seconds).
> This isn't really relevant to anything else in this thread, except as a
> reminder of how freaky date/time handling is.

Maybe not directly relevant, but also very clearly why one should ideally
not use these at all! Perhaps even less relevant, but if you do need
absolute times (and thus work with UTC or TAI or GPS), have a look at
astropy's `Time` class. It does use two doubles, but with that maintains
"sub-nanosecond precision over times spanning the age of the universe" [1].
And it even converts to strings nicely!

-- Marten

NumPy-Discussion mailing list

Re: [Numpy-discussion] method to calculate the magnitude squared

2015-10-11 Thread Marten van Kerkwijk
Hi Nils,

I think performance will actually be better than I indicated, especially
for larger arrays, since `r.real**2 + r.imag**2` makes a quite unnecessary
intermediate arrays. With a `ufunc`, this can be done much faster. Indeed,
it should be no slower than `np.square` (which does more operations):

%timeit b = np.square(a)
10 loops, best of 3: 16.6 µs per loop

[This is on same laptop as the timings above.]

Chuck: agreed that a future np.abs2 could just reuse the internal ufunc
loops for np.square except for the complex case.
NumPy-Discussion mailing list

Re: [Numpy-discussion] method to calculate the magnitude squared

2015-10-10 Thread Marten van Kerkwijk
> We tend to avoid adding methods. 2) would be a very easy enhancement,
just a slight modification of sqr.

Did you mean `np.square`? Sadly, that doesn't do the right thing:
`np.square(1+1j)` yields `2j`, while one wants `c*c.conj()` and thus `2`.
Or, for fastest speed, really just `c.real**2 + c.imag**2`.

My guess would be that a new ufunc, say `np.abs2` or `np.modulus2` or so,
would be more appropriate than defining a new method. I'd also be hesitant
to define a new private method -- I like how those usually are just used to
override python basics.

-- Marten
NumPy-Discussion mailing list

Re: [Numpy-discussion] composition of the steering council (was Re: Governance model request)

2015-09-25 Thread Marten van Kerkwijk
Hi Nathaniel,

Piping in again from the outside: being on council would certainly seem to
be a job not a privilege, and I suspect it will be hard enough to find
people willing to actually put in work to worry about restricting
membership overly.

Given this, my suggestion would be to have a general, no-name escape clause
to the rules, stating that anybody who does not formally meet the rules is
welcome to *volunteer* being on the council, with a suggestion on what s/he
might contribute, and the existing council can then decide whether or not
that contribution is welcome. My sense would be that any time this happens
one will be most happy to accept.

All the best,

p.s. Independently of rules, I don't see how Travis would not qualify even
from current work, given that he has just committed to actively try to
improve/generalize dtype.
NumPy-Discussion mailing list

Re: [Numpy-discussion] Governance model request

2015-09-22 Thread Marten van Kerkwijk
Hi All,

I've been reading this thread with amazement and a bit of worry. It seems
Nathaniel's proposal is clearly an improvement, even if it is not perfect.
But it is in the end for a project where, at least as seen from the
outside, the main challenge is not in governance, but rather in having only
a small group of people who understand the code well enough that they are
able to make and judge modifications. As such, this discussion doesn't seem
worthy of the effort, and even less of needless heat and irritation, of the
type that seems unlikely would have arisen if this conversation had been in
person instead of per e-mail.

Might it be an idea to accept the proposal provisionally, returning to it a
year from now with practical experience? This certainly has the benefit of
allowing to switch focus to the more pressing and fortunately also more
interesting work to be done on interfacing numpy/ndarray nicely with other
classes (i.e., __numpy_ufunc__ and/or similar, and the dtype
generalisations, which have me quite intrigued -- either might be very
interesting for the Quantity class in astropy, as well as for a
work-in-progress Variable class [which propagates uncertainties including
All the best,


Prof. M. H. van Kerkwijk
Dept. of Astronomy & Astroph., 50 St George St., Toronto, ON, M5S 3H4,
McLennan Labs, room 1203B, tel: +1(416)9467288, fax: +1(416)9467287,
NumPy-Discussion mailing list

Re: [Numpy-discussion] method to calculate the magnitude squared

2015-09-20 Thread Marten van Kerkwijk
> Is that not the same as
> np.abs(z)**2 ?

It is, but since that involves taking sqrt, it is *much* slower. Even now,
In [32]: r = np.arange(1)*(1+1j)

In [33]: %timeit np.abs(r)**2
1000 loops, best of 3: 213 µs per loop

In [34]: %timeit r.real**2 + r.imag**2
1 loops, best of 3: 47.5 µs per loop

-- Marten
NumPy-Discussion mailing list

Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015

2015-09-03 Thread Marten van Kerkwijk
Hi Nathaniel,

Thanks for the detailed reply; it helped a lot to understand how one could,
indeed, have dtypes contain units. And if one had not just on-the-fly
conversion from int to float as part of an internal loop, but also
on-the-fly multiplication, then it would even be remarkably fast. Will be
interesting to think this through in more detail.

Still think subclassing ndarray is not all *that* bad (MaskedArray is a
different story...), and it may still be needed for my other examples, but
perhaps masked/uncertainties do work with the collections idea. Anyway, it
now makes sense to focus on dtype first.

Thanks again,

NumPy-Discussion mailing list

Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015

2015-08-30 Thread Marten van Kerkwijk
Hi Nathaniel, others,

I read the discussion of plans with interest. One item that struck me is
that while there are great plans to have a proper extensible and presumably
subclassable dtype, it is discouraged to subclass ndarray itself (rather,
it is encouraged to use a broader array interface). From my experience with
astropy in both Quantity (an ndarray subclass), Time (a separate class
containing high precision times using two ndarray float64), and Table
(initially holding structured arrays, but now sets of Columns, which
themselves are ndarray subclasses), I'm not convinced the broader, new
containers approach is that much preferable. Rather, it leads to a lot of
boiler-plate code to reimplement things ndarray does already (since one is
effectively just calling the methods on the underlying arrays).

I also think the idea that a dtype becomes something that also contains a
unit is a bit odd. Shouldn't dtype just be about how data is stored? Why
include meta-data such as units?

Instead, I think a quantity is most logically seen as numbers with a unit,
just like masked arrays are numbers with masks, and variables numbers with
uncertainties. Each of these cases adds extra information in different
forms, and all are quite easily thought of as subclasses of ndarray where
all operations do the normal operation, plus some extra work to keep the
extra information up to date.

Anyway, my suggestion would be to *encourage* rather than discourage
ndarray subclassing, and help this by making ndarray (even) better.

All the best,


On Thu, Aug 27, 2015 at 11:03 AM, wrote:

 On Wed, Aug 26, 2015 at 10:06 AM, Travis Oliphant

 On Wed, Aug 26, 2015 at 1:41 AM, Nathaniel Smith wrote:

 Hi Travis,

 Thanks for taking the time to write up your thoughts!

 I have many thoughts in return, but I will try to restrict myself to two
 main ones :-).

 1) On the question of whether work should be directed towards improving
 NumPy-as-it-is or instead towards a compatibility-breaking replacement:
 There's plenty of room for debate about whether it's better engineering
 practice to try and evolve an existing system in place versus starting
 over, and I guess we have some fundamental disagreements there, but I
 actually think this debate is a distraction -- we can agree to disagree,
 because in fact we have to try both.

 Yes, on this we agree.   I think NumPy can improve *and* we can have new
 innovative array objects.   I don't disagree about that.

 At a practical level: NumPy *is* going to continue to evolve, because it
 has users and people interested in evolving it; similarly, dynd and other
 alternatives libraries will also continue to evolve, because they also have
 people interested in doing it. And at a normative level, this is a good
 thing! If NumPy and dynd both get better, than that's awesome: the worst
 case is that NumPy adds the new features that we talked about at the
 meeting, and dynd simultaneously becomes so awesome that everyone wants to
 switch to it, and the result of this would be... that those NumPy features
 are exactly the ones that will make the transition to dynd easier. Or if
 some part of that plan goes wrong, then well, NumPy will still be there as
 a fallback, and in the mean time we've actually fixed the major pain points
 our users are begging us to fix.

 You seem to be urging us all to make a double-or-nothing wager that your
 extremely ambitious plans will all work out, with the entire numerical
 Python ecosystem as the stakes. I think this ambition is awesome, but maybe
 it'd be wise to hedge our bets a bit?

 You are mis-characterizing my view.  I think NumPy can evolve (though I
 would personally rather see a bigger change to the underlying system like I
 outlined before).But, I don't believe it can even evolve easily in the
 direction needed without breaking ABI and that insisting on not breaking it
 or even putting too much effort into not breaking it will continue to
 create less-optimal solutions that are harder to maintain and do not take
 advantage of knowledge this community now has.

 I'm also very concerned that 'evolving' NumPy will create a situation
 where there are regular semantic and subtle API changes that will cause
 NumPy to be less stable for it's user-base.I've watched this happen.
 This at a time that people are already looking around for new and different
 approaches anyway.

 2) You really emphasize this idea of an ABI-breaking (but not
 API-breaking) release, and I think this must indicate some basic gap in how
 we're looking at things. Where I'm getting stuck here is that... I actually
 can't think of anything important that we can't do now, but could if we
 were allowed to break ABI compatibility. The kinds of things that break ABI
 but keep API are like... rearranging what order the fields in a struct fall
 in, or changing the numeric value of opaque constants like

Re: [Numpy-discussion] Changes to np.digitize since NumPy 1.9?

2015-08-14 Thread Marten van Kerkwijk
For what it's worth, also from my astropy perspective I think hat any index
array should be a base ndarray!
-- Marten

On Fri, Aug 14, 2015 at 7:11 AM, Jaime Fernández del Río wrote:

 On Thu, Aug 13, 2015 at 9:57 AM, Jaime Fernández del Río wrote:

 On Thu, Aug 13, 2015 at 7:59 AM, Nathan Goldbaum

 On Thu, Aug 13, 2015 at 9:44 AM, Charles R Harris wrote:

 On Thu, Aug 13, 2015 at 12:09 AM, Jaime Fernández del Río wrote:

 On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum wrote:

 Hi all,

 I've been testing the package I spend most of my time on, yt, under
 numpy 1.10b1 since the announcement went out.

 I think I've narrowed down and fixed all of the test failures that
 cropped up except for one last issue. It seems that the behavior of
 np.digitize with respect to ndarray subclasses has changed since the 
 1.9 series. Consider the following test script:

 import numpy as np

 class MyArray(np.ndarray):
 def __new__(cls, *args, **kwargs):
 return np.ndarray.__new__(cls, *args, **kwargs)

 data = np.arange(100)

 bins = np.arange(100) + 0.5

 data = data.view(MyArray)

 bins = bins.view(MyArray)

 digits = np.digitize(data, bins)

 print type(digits)

 Under NumPy 1.9.2, this prints type 'numpy.ndarray', but under
 the 1.10 beta, it prints class '__main__.MyArray'

 I'm curious why this change was made. Since digitize outputs index
 arrays, it doesn't make sense to me why it should return anything but a
 plain ndarray. I see in the release notes that digitize now uses
 searchsorted under the hood. Is this related?

 It is indeed searchsorted's fault, as it returns an object of the same
 type as the needle (the items to search for):

  import numpy as np
  class A(np.ndarray): pass
  class B(np.ndarray): pass
 B([0, 1, 2, 3, 4])

 I am all for making index-returning functions always return a base
 ndarray, and will be more than happy to send a PR fixing this if there is
 some agreement.

 I think that is the right thing to do.

 Awesome, I'd appreciate having a PR to fix this. Arguably the return
 type *could* be the same type as the inputs, but given that it's a behavior
 change I agree that it's best to add a patch so the output of serachsorted
 is sanitized to be an ndarray before it's returned by digitize.

 It is relatively simple to do, just replace Py_TYPE(ap2) with
 PyArray_Type in this line:

 Then fix all the tests that are expecting searchsorted to return
 something else than a base ndarray. We already have modified nonzero to
 return base ndarray's in this release, see the release notes, so it will go
 with the same theme.

 For 1.11 I think we should try to extend this if it returns an index, it
 will be a base ndarray to all other functions that don't right now. Then
 sit back and watch AstroPy come down in flames... ;-)))

 Seriously, I think this makes a lot of sense, and should be documented as
 the way NumPy handles index arrays.

 Anyway, I will try to find time tonight to put this PR together, unless
 someone beats me to it, which I would be totally fine with.

 PR #6206 it is:


 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
 de dominación mundial.

 NumPy-Discussion mailing list

NumPy-Discussion mailing list

Re: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?

2015-05-12 Thread Marten van Kerkwijk
Agreed that indexing functions should return bare `ndarray`. Note that in
Jaime's PR one can override it anyway by defining __nonzero__.  -- Marten

On Sat, May 9, 2015 at 9:53 PM, Stephan Hoyer wrote:

  With regards to np.where -- shouldn't where be a ufunc, so subclasses or
 other array-likes can be control its behavior with __numpy_ufunc__?

 As for the other indexing functions, I don't have a strong opinion about
 how they should handle subclasses. But it is certainly tricky to attempt to
 handle handle arbitrary subclasses. I would agree that the least error
 prone thing to do is usually to return base ndarrays. Better to force
 subclasses to override methods explicitly.

 NumPy-Discussion mailing list

NumPy-Discussion mailing list