[Numpy-discussion] Re: Automatic Clipping of array to upper / lower bounds of dtype

2024-03-10 Thread Marten van Kerkwijk
> Also, can’t get __array_wrap__ to work. The arguments it receives after 
> __iadd__ are all
> post-operation. Decided not to do it this way this time so not to hardcode 
> such functionality
> into the class, but if there is a way to robustly achieve this it would be 
> good to know.

It is non-trivial to clip in the operation, since internally the code
just wraps like C does (actually, I think for some C implementations it
clips...). I think you'd be stuck with evaluating with more bits and
then clipping before casting back (probably easiest by definining your
own __array_ufunc__) -- or writing your own inner loops.

-- Marten

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-15 Thread Marten van Kerkwijk
> One more thing to mention on this topic.
>
> From a certain size dot product becomes faster than sum (due to 
> parallelisation I guess?).
>
> E.g.
> def dotsum(arr):
> a = arr.reshape(1000, 100)
> return a.dot(np.ones(100)).sum()
>
> a = np.ones(10)
>
> In [45]: %timeit np.add.reduce(a, axis=None)
> 42.8 µs ± 2.44 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>
> In [43]: %timeit dotsum(a)
> 26.1 µs ± 718 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>
> But theoretically, sum, should be faster than dot product by a fair bit.
>
> Isn’t parallelisation implemented for it?

I cannot reproduce that:

In [3]: %timeit np.add.reduce(a, axis=None)
19.7 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [4]: %timeit dotsum(a)
47.2 µs ± 360 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

But almost certainly it is indeed due to optimizations, since .dot uses
BLAS which is highly optimized (at least on some platforms, clearly
better on yours than on mine!).

I thought .sum() was optimized too, but perhaps less so?

It may be good to raise a quick issue about this!

Thanks, Marten
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-15 Thread Marten van Kerkwijk
> From my experience, calling methods is generally faster than
> functions. I figure it is due to having less overhead figuring out the
> input. Maybe it is not significant for large data, but it does make a
> difference even when working for medium sized arrays - say float size
> 5000.
> 
> %timeit a.sum()
> 3.17 µs
> %timeit np.sum(a)
> 5.18 µs

It is more that np.sum checks if there is a .sum() method and if so
calls that.  And then `ndarray.sum()` calls `np.add.reduce(array)`.

In [2]: a = np.arange(5000.)

In [3]: %timeit np.sum(a)
3.89 µs ± 411 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [4]: %timeit a.sum()
2.43 µs ± 42 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [5]: %timeit np.add.reduce(a)
2.33 µs ± 31 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Though I must admit I'm a bit surprised the excess is *that* large for
using np.sum...  There may be a little micro-optimization to be found...

-- Marten
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-15 Thread Marten van Kerkwijk
> What were your conclusions after experimenting with chained ufuncs?
> 
> If the speed is comparable to numexpr, wouldn’t it be `nicer` to have
> non-string input format?
> 
> It would feel a bit less like a black-box.

I haven't gotten further than it yet, it is just some toying around I've
been doing.  But I'd indeed prefer not to go via strings -- possibly
numexpr could use a similar mechanism to what I did to construct the
function that is being evaluated.

Aside: your suggestion of the pipe led to some further discussion at
https://github.com/numpy/numpy/issues/25826#issuecomment-1947342581
-- as a more general way of passing arrays to functions.

-- Marten
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: Introducing a pipe Method for Numpy arrays

2024-02-15 Thread Marten van Kerkwijk
Hi Oyibo,

> I'm proposing the introduction of a `pipe` method for NumPy arrays to enhance 
> their usability and expressiveness.

I think it is an interesting idea, but agree with Robert that it is
unlikely to fly on its own.  Part of the logic of even frowning on
methods like .mean() and .sum() is that ndarray is really a data
container, and should have methods related to that, as much as possible
independent of the meaning of those data (which is given by the dtype).

A bit more generally, your example is nice, but a pipe can have just one
input, while of course many operations require two or more.

> - Optimization: While NumPy may not currently optimize chained
> expressions, the introduction of pipe lays the groundwork for
> potential future optimizations with lazy evaluation.

Optimization might indeed be made possible, though I would think that
for that one may be better off with something like dask.

That said, I've been playing with the ability to chain ufuncs to
optimize their execution, by applying the ufuncs in series on small
pieces of larger arrays, thus avoiding large temporaries (a bit like
numexpr but with the idea of defining a fast function rather than giving
an expression as a string); see https://github.com/mhvk/chain_ufunc

All the best,

Marten


___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: API: make numpy.lib._arraysetops.intersect1d work on multiple arrays #25688

2024-02-02 Thread Marten van Kerkwijk
> For my own work, I required the intersect1d function to work on multiple 
> arrays while returning the indices (using `return_indizes=True`). 
> Consequently I changed the function in numpy and now I am seeking 
> feedback from the community.
>
> This is the corresponding PR: https://github.com/numpy/numpy/pull/25688



To me this looks like a very sensible generalization.  In terms of numpy
API, the only real change is that, effectively, the assume_unique and
return_indices arguments become keyword-only, i.e., in the unlikely case
that someone passed those as positional, a trivial backward-compatible
change will fix it.

-- Marten
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: New matvec and vecmat functions

2024-01-26 Thread Marten van Kerkwijk
> I tend to agree with not using the complex conjugate for vecmat, but would 
> prefer having
> separate functions for that that make it explicit in the name. I also note 
> that mathematicians
> use sesquilinear forms, which have the vector conjugate on the other side, so 
> there are
> different conventions. I prefer the Dirac convention myself, but many 
> mathematical methods
> texts use the opposite. It is tricky for the teacher in introductory courses, 
> right up there with
> vectors being called contravariant when they are actually covariant (the 
> coefficients are
> contravariant). Anyway, I think having the convention explicit in the name 
> will avoid confusion.

Hi Chuck,

Indeed, which argument to conjugate seems not generally agreed on --
definitely found wikipedia pages using both convention!  But I think
numpy made the choice with vdot and vecdot, so that seems mostly a
matter of documentation.

I'm slightly confused whether or not you agree that the complex
conjugate is useful, but can see your point of a name that makes it
clear.  Or perhaps one could have a wrapper function that let's one
switch between conjugating the vector or not?

One thing perhaps worth mentioning is that if we decide that vecmat does
conjugate, the non-conjugating version can still trivially be done by a
transpose using matvec(matrix.mT, vector) [1], while if vecmat does not
conjugate by default, getting the conjugate version using the "obvious"
vecmat(vector.conj(), matrix) implies a potentially costly copy of
the whole vector array.  (This can be avoided using the much less
obvious np.vecdot(vec[..., np.newaxis, :], mat.mT), but I'm not sure
that one really wants to suggest that...)

More generally, in my own experience working with complex matrices and
vectors, just about anytime one needs a transpose, a conjugate is
generally needed too.

All the best,

Marten

[1] Indeed, the implementation of vecmat uses the matvec loops with the
matrix transposed for all but the complex numbers.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: New matvec and vecmat functions

2024-01-25 Thread Marten van Kerkwijk
Hi Alan,

The problem with .dot is not that it is not possible, but more that it
is not obvious exactly what will happen given the overloading of
multiple use cases; indeed, this is why `np.matmul` was created.  For
the stacks of vectors case in particular, it is surprising that the
vector dimension is the one but last rather than the last dimension (at
least if the total dimension is more than 2).

The idea here is to have a function that does just one obvious thing
(sadly, for matmul we didn't quite stick to that...).

All the best,

Marten
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: New matvec and vecmat functions

2024-01-24 Thread Marten van Kerkwijk
> Could you please offer some code or math notation to help communicate this?
> I am forced to guess at the need.
>
> The words "matrix" and "vector" are ambiguous.
> After all, matrices (of given shape) are a type of vector (i.e., can be added 
> and scaled.)
> So if by "matrix" you mean "2d array" and by "stack of vectors" you 
> effectively mean "2d array",
> this sounds like a use for np.dot (possibly after a transpose).
> However I am going to guess that here by "vector" you actually mean a matrix
> (i.e., a 2d array) with only one row or only one column, so a "stack" of them
> is actually 3d. Perhaps the needless dimension is then the real problem
> and can either not be produced or can be squeezed away..

Stack of matrices in this context is a an ndarray in which the last two
dimensions are interpreted as columns and rows of matrices (in that
order), stack of vectors as an ndarray in which the last dimension is
interpreted as column vectors.  (Well, axes in all these functions can
be chosen arbitrarily, but those are the defaults.)

So a simple example for matvec would be a rotation matrix that I'd like
to apply to a large number of vectors (say to points in space); this is
currently not easy.  Complex vectors might be Fourier components.  (Note
that I was rather sloppy in calling it multiplication rather than using
the term vector-matrix product, etc.; it is definitely not element-wise!).

The vector-matrix product comes up a bit less, but as mentioned by
Evgeni in physics there is, e.g., the bra-ket stuff with Hamiltonians
(<\psi | \hat H | \psi>), and in linear least squares one often gets
terms which for real data are written as Xᵀ Z, but for complex would be
Xᴴ Z [1]

Hope this clarifies things!

-- Marten

[1] https://en.wikipedia.org/wiki/Linear_least_squares
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: New matvec and vecmat functions

2024-01-24 Thread Marten van Kerkwijk
> FWIW, +1 for matvec & vecmat to complement matmat (erm, matmul). Having a 
> binop where one argument is a matrix and the other is a
> stack/batch of vectors is indeed awkward otherwise, and a dedicated function 
> to clearly distinguish "two matrices" from "a matrix and a
> batch of vectors" sounds great from a user perspective.
>
> As to vecmat doing hermitian conjugation of the vector argument --- I'd be in 
> favor (because <\psi | \hat H | \psi>) but this is a weak
> preference.

Indeed, what wikipedia calls the "physics convention" [1] of both the
vector dot product  = x† y and vector matrix product x† A († =
transpose-conjugate).

-- Marten

[1] https://en.wikipedia.org/wiki/Sesquilinear_form
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: New matvec and vecmat functions

2024-01-24 Thread Marten van Kerkwijk
> Why do these belong in NumPy? What is the broad field of application of these 
> functions? And,
> does a more general concept underpin them?

Multiplication of a matrix with a vector is about as common as matrix
with matrix or vector with vector, and not currently easy to do for
stacks of vectors, so I think the case for matvec is similarly strong as
that for matmul and vecdot.

Arguably, vecmat is slightly less common, though completes the quad.

-- Marten


___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: New matvec and vecmat functions

2024-01-23 Thread Marten van Kerkwijk
> I can understand the desire to generalise the idea of matrix
> multiplication for when the arrays are not both 2-D but taking the
> complex conjugate makes absolutely no sense in the context of matrix
> multiplication.
> 
> You note above that "vecmat is defined as x†A" but my interpretation
> of that is that vecmat(x, A) == matmul(conjugate(transpose(x)), A). If
> you want to define vecmat like that then maybe that makes sense in
> some contexts but including the conjugate as an implicit part of
> matmul is something that I would find very confusing: such a function
> should not be called matmul.

Ah, that's indeed fair.  So, I'll remove the idea to change what the
special-casing of 1d arrays does in matmul.  Options should just be to
keep things as they are, or to remove that ability altogether.  I'd
personally tend to the latter.

-- Marten
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: New matvec and vecmat functions

2024-01-23 Thread Marten van Kerkwijk
>> I also note that for complex numbers, `vecmat` is defined as `x†A`,
>> i.e., the complex conjugate of the vector is taken. This seems to be the
>> standard and is what we used for `vecdot` too (`x†x`). However, it is
>> *not* what `matmul` does for vector-matrix or indeed vector-vector
>> products (remember that those are possible only if the vector is
>> one-dimensional, i.e., not with a stack of vectors). I think this is a
>> bug in matmul, which I'm happy to fix. But I'm posting here in part to
>> get feedback on that.
> 
> Does matmul not mean "matrix multiplication"?
> 
> There is no conjugation in matrix multiplication.
> 
> Have I misunderstood what matmul is supposed to be? (If it does mean
> anything other than matrix multiplication then it is an extremely poor
> choice of name.)

Right now, there is this weird special-casing of one-1 arguments which
are treated as vectors that are promoted to matrices [1].  I agree that
it is illogical to have, which is why I had the postscript about
removing it from np.matmul, and instead letting `@` call vecdot, vecmat,
matvec, or matmul as appropriate.

-- Marten

[1] From the docstring of np.matmul
"""
...
The behavior depends on the arguments in the following way.

- If both arguments are 2-D they are multiplied like conventional
  matrices.
- If either argument is N-D, N > 2, it is treated as a stack of
  matrices residing in the last two indexes and broadcast accordingly.
- If the first argument is 1-D, it is promoted to a matrix by
  prepending a 1 to its dimensions. After matrix multiplication
  the prepended 1 is removed.
- If the second argument is 1-D, it is promoted to a matrix by
  appending a 1 to its dimensions. After matrix multiplication
  the appended 1 is removed.
"""

Note that the description is consistent with the current behaviour, but
I don't think it is correct (certainly for two 1-d arguments) and my
suggestion corresponds to changing the third item to,

- If the first argument is 1-D, it is promoted to a matrix by
  prepending a 1 to its dimensions and taking the conjugate if complex.
  After matrix multiplication the prepended 1 is removed.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: New matvec and vecmat functions

2024-01-23 Thread Marten van Kerkwijk
> For dot product I can convince myself this is a math definition thing and 
> accept the
> conjugation. But for "vecmat" why the complex conjugate of the vector? Are we 
> assuming that
> 1D things are always columns. I am also a bit lost on the difference of dot, 
> vdot and vecdot.
>
> Also if __matmul__ and np.matmul give different results, I think you will 
> enjoy many fun tickets.
> Personally I would agree with them no matter what the reasoning was at the 
> time of
> divergence.

For vecmat, the logic is indeed that the 1D vectors are always columns,
so one needs to transpose and for complex add a conjugate.

On the differences between dot, vdot, and vecdot: dot is basically
weird, enough for matmul to be added. vdot flattens the arrays before
calculating x†x, vecdot uses the last axis (or an explicitly given one).

On matmul, indeed we should ensure __matmul__ and np.matmul give
identical results; what I meant was that np.matmul doesn't necessarily
have to do the same special-casing for vectors, it could just deal with
matrices only, and let __matmul__ call vecdot, vecmat, matvec, or matmul
as appropriate.  Anyway, somewhat orthogonal to the discussion here.

-- Marten
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] New matvec and vecmat functions

2024-01-23 Thread Marten van Kerkwijk
Hi All,

I have a PR [1] that adds `np.matvec` and `np.vecmat` gufuncs for
matrix-vector and vector-matrix calculations, to add to plain
matrix-matrix multiplication with `np.matmul` and the inner vector
product with `np.vecdot`.  They call BLAS where possible for speed.
I'd like to hear whether these are good additions.

I also note that for complex numbers, `vecmat` is defined as `x†A`,
i.e., the complex conjugate of the vector is taken. This seems to be the
standard and is what we used for `vecdot` too (`x†x`). However, it is
*not* what `matmul` does for vector-matrix or indeed vector-vector
products (remember that those are possible only if the vector is
one-dimensional, i.e., not with a stack of vectors). I think this is a
bug in matmul, which I'm happy to fix. But I'm posting here in part to
get feedback on that.

Thanks!

Marten

[1] https://github.com/numpy/numpy/pull/25675

p.s. Separately, with these functions available, in principle these
could be used in `__matmul__` (and thus for `@`) and the specializations
in `np.matmul` removed. But that can be a separate PR (if it is wanted
at all).
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Change definition of complex sign (and use it in copysign)

2024-01-04 Thread Marten van Kerkwijk
Hi All,

Thanks for the comments on complex sign - it seems there is good support
for it.

On copysign, currently it is not supported for complex values at all.  I
think given the responses so far, it looks like we should just keep it
like that; although my extension was fairly logical, I cannot say that I
know that I can think of a clear use case!  Anyway, I'm fine with
removing it from the PR.

All the best,

Marten
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Fixing definition of reduceat for Numpy 2.0?

2023-12-23 Thread Marten van Kerkwijk
Hi Sebastian,

> That looks nice, I don't have a clear feeling on the order of items, if
> we think of it in terms of `(start, stop)` there was also the idea
> voiced to simply add another name in which case you would allow start
> and stop to be separate arrays.

Yes, one could add another method.  Or perhaps even add a new argument
to `.reduce` instead (say `slices`).  But this seemed the simplest
route...

> Of course if go with your `slice(start, stop)` idea that also works,
> although passing as separate parameters seems nice too.
> 
> Adding another name (if we can think of one at least) seems pretty good
> to me, since I suspect we would add docs to suggest not using
> `reduceat`.

If we'd want to, even with the present PR it would be possible to (very
slowly) deprecate the use of a list of single integers.  But I'm trying
to go with just making the existing method more useful.

> One small thing about the PR: I would like to distinct `default` and
> `initial`.  I.e. the default value is used only for empty reductions,
> while the initial value should be always used (unless you would pass
> both, which we don't for normal reductions though).
> I suppose the machinery isn't quite set up to do both side-by-side.

I just followed what is done for reduce, where a default could also have
made sense given that `where` can exclude all inputs along a given row.
I'm not convinced it would be necessary to have both, though it would
not be hard to add.

All the best,

Marten
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Fixing definition of reduceat for Numpy 2.0?

2023-12-22 Thread Marten van Kerkwijk
Hi Martin,

I agree it is a long-standing issue, and I was reminded of it by your
comment.  I have a draft PR at https://github.com/numpy/numpy/pull/25476
that does not change the old behaviour, but allows you to pass in a
start-stop array which behaves more sensibly (exact API TBD).

Please have a look!

Marten

Martin Ling  writes:

> Hi folks,
> 
> I don't follow numpy development in much detail these days but I see
> that there is a 2.0 release planned soon.
> 
> Would this be an opportunity to change the behaviour of 'reduceat'?
> 
> This issue has been open in some form since 2006!
> https://github.com/numpy/numpy/issues/834
> 
> The current behaviour was originally inherited from Numeric, and makes
> reduceat often unusable in practice, even where it should be the
> perfect, concise, efficient solution. But it has been impossible to
> change it without breaking compatibіlity with existing code.
> 
> As a result, horrible hacks are needed instead, e.g. my answer here:
> https://stackoverflow.com/questions/57694003
> 
> Is this something that could finally be fixed in 2.0?
> 
> 
> Martin
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: m...@astro.utoronto.ca
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-26 Thread Marten van Kerkwijk
Hi Ralf,

I realize you feel strongly that this whole thread is rehashing history,
but I think it is worth pointing out that many seem to consider that the
criterion for allowing backward incompatible changes, i.e., that "existing
code is buggy or is consistently confusing many users", is actually
fulfilled here.

Indeed, this appears true to such an extent that even those among the
steering council do not agree: while the topic of this thread was about
introducing *new* properties (because in the relevant issue I had suggested
to Steward it was not possible to change .T), it was Eric who brought up
the question whether we shouldn't just change `.T` after all. And in the
relevant issue, Sebastian noted that "I am not quite convinced that we
cannot change .T (at least in the sense of deprecation) myself", with Chuck
chiming in that "I don't recall being in opposition, and I also think the
current transpose is not what we want."

That makes three of your fellow steering council members who are not sure,
despite all the previous discussions (of which Chuck surely has seen most -
sorry, Chuck!).

It seems to me the only sure way in which we can avoid future discussions
is to actually address the underlying problem. E.g., is the cost of
deprecating & changing .T truly that much more than even having this
discussion?

All the best,

Marten


On Wed, Jun 26, 2019 at 4:18 PM Ralf Gommers  wrote:

>
>
> On Wed, Jun 26, 2019 at 10:04 PM Kirill Balunov 
> wrote:
>
>> Only concerns #4 from Ilhan's list.
>>
>> ср, 26 июн. 2019 г. в 00:01, Ralf Gommers :
>>
>>>
>>> []
>>>
>>> Perhaps not full consensus between the many people with different
>>> opinions and interests. But for the first one, arr.T change: it's clear
>>> that this won't happen.
>>>
>>
>> To begin with, I must admit that I am not familiar with the accepted
>> policy of introducing changes to NumPy. But I find it quite
>> nonconstructive just to say - it will not happen. What then is the point
>> in the discussion?
>>
>
> There has been a *very* long discussion already, and several others on the
> same topic before. There are also long-standing ways of dealing with
> backwards compatibility - e.g. what Matthew said is not new, it's an agreed
> upon way of working.
> http://www.numpy.org/neps/nep-0023-backwards-compatibility.html lists
> some principles. That NEP is not yet accepted (it needs rework), but it
> gives a good idea of what does and does not go.
>
>
>>
>>
>>> Between Juan's examples of valid use, and what Stephan and Matthew said,
>>> there's not much more to add. We're not going to change correct code for
>>> minor benefits.
>>>
>>
>> I fully agree that any feature can find its use, valid or not is another
>> question. Juan did not present these examples, but I will allow myself
>> to assume that it is more correct to describe what is being done there as a
>> permutation, and not a transpose. In addition, in the very next
>> sentence, Juan adds that "These could be easily changed to .transpose()
>> (honestly they probably should!)"
>>
>> We're not going to change correct code for minor benefits.
>>>
>>
>> It's fair, I personally have no preferences in both cases, the most
>> important thing for me is that in the 2d case it works correctly. To be
>> honest, until today, I thought that `.T` will raise for` ndim > 2`. At
>> least that's what my experience told me. For example in
>>
>> Matlab - Error using  .' Transpose on ND array is not defined. Use
>> PERMUTE instead.
>>
>> Julia - transpose not defined for Array(Float64, 3). Consider using
>> permutedims for higher-dimensional arrays.
>>
>> Sympy - raise ValueError("array rank not 2")
>>
>> Here, I agree with the authors that, to begin with, `transpose` is not
>> the best name, since in general it doesn’t fit as an any mathematical
>> definition (of course it will depend on what we take as an element) or a
>> definition from linear algebra. Thus the name `transpose` only leads to
>> confusion.
>>
>> For a note about another suggestion - `.T` to mean a transpose of the
>> last two dimensions, in Mathematica authors for some reason did the
>> opposite (personally, I could not understand why they made such a choice
>> :) ):
>>
>> Transpose[list]
>> transposes the first two levels in list.
>>
>> I feel strongly that we should have the following policy:
>>>
>>> * Under no circumstances should we make changes that mean that
>>> correct
>>> old code will give different results with new Numpy.
>>>
>>
>> I find this overly strict rules that do not allow to evolve. I
>> completely agree that a silent change in behavior is a disaster, that
>> changing behavior (if it is not an error) in the same minor version (1.X.Y)
>> is not acceptable, but I see no reason to extend this rule for a major
>> version bump (2.A.B.),  especially if it allows something to improve.
>>
>
> I'm sorry, you'll have to live with this rule. We've had lots of
> discussion about this rule in 

Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-26 Thread Marten van Kerkwijk
> The main motivation for the @ PEP was actually to be able to get rid of
> objects like np.matrix and scipy.sparse matrices that redefine the meaning
> of the * operator. Quote: "This PEP proposes the minimum effective change
> to Python syntax that will allow us to drain this swamp [meaning np.matrix
> & co]."
>
> Notably, the @ PEP was written by Nathaniel, who was opposed to a copying
> .H.
>

I should note that my comment invoking the history of @ was about the
regular transpose, .mT.  The executive summary of the PEP includes the
following relevant sentence:
"""
Currently, most numerical Python code uses * for elementwise
multiplication, and function/method syntax for matrix multiplication;
however, this leads to ugly and unreadable code in common circumstances.
"""

Exactly the same holds for matrix transpose, and indeed for many matrix
expressions the gain in readability is lost without a clean option to do
the transpose.

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-25 Thread Marten van Kerkwijk
Hi Ralf,

On Tue, Jun 25, 2019 at 6:31 PM Ralf Gommers  wrote:

>
>
> On Tue, Jun 25, 2019 at 11:02 PM Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>>
>> For the names, my suggestion of lower-casing the M in the initial one,
>> i.e., `.mT` and `.mH`, so far seemed most supported (and I think we should
>> discuss *assuming* those would eventually involve not copying data; let's
>> not worry about implementation details).
>>
>
> For the record, this is not an implementation detail. It was the consensus
> before that `H` is a bad idea unless it returns a view just like `T`:
> https://github.com/numpy/numpy/issues/8882
>

Is there more than an issue in which Nathaniel rejecting it mentioning some
previous consensus? I was part of the discussion of the complex conjugate
dtype, but do not recall any consensus beyond a "wish to keep properties
simple". Certainly the "property does not do any calculation" rule seems
arbitrary; the only strict rule I would apply myself is that the
computation should not be able to fail (computationally, out-of-memory does
not count; that's like large integer overflow). So,  I'd definitely agree
with you if we were discussion a property `.I` for matrix inverse (and
indeed have said so in related issues). But for .H, not so much. Certainly
whoever wrote np.Matrix didn't seem to feel bound by it.

Note that for *matrix* transpose (as opposed to general axis reordering
with .tranpose()), I see far less use for what is returned being a writable
view. Indeed, for conjugate transpose, I would rather never allow writing
back even if it we had the conjugate dtype since one would surely get it
wrong (likely, `.conj()` would also return a read-only view, at least by
default; perhaps one should even go as far as only allowing
`a.view(conjugate-dtype)` as they way to get a writable view).


> So, specific items to confirm:
>
> 1) Is this a worthy addition? (certainly, their existence would reduce
> confusion about `.T`... so far, my sense is tentative yes)
>
> 2) Are `.mT` and `.mH` indeed the consensus? [1]
>

> I think `H` would be good to revisit *if* it can be made to return a
view. I think a tweak on `T` for >2-D input does not meet the bar for
inclusion.

Well, I guess it is obvious I disagree: I think this more than meets the
bar for inclusion. To me, this certainly is a much bigger deal that
something like oindex or vindex (which I do like).

Indeed, it would seem to me that if a visually more convenient way to do
(stacks of) matrix multiplication for numpy is good enough to warrant
changing the python syntax, then surely having a visually more convenient
standard way to do matrix transpose should not be considered off-limits for
ndarray; how often do you see a series matrix manipulations that does not
involve both multiplication and transpose?

It certainly doesn't seem to me much of an argument that someone previously
decided to use .T for a shortcut for the computer scientist idea of
transpose to not allow the mathematical/physical-scientist one - one I
would argue is guaranteed to be used much more.

The latter of course just repeats what many others have written above, but
since given that you call it a "tweak", perhaps it is worth backing up. For
astropy, a quick grep gives:

- 28 uses of the matrix_transpose function I wrote because numpy doesn't
have even a simple function for that and the people who wrote the original
code used the Matrix class which had the proper .T (but doesn't extend to
multiple dimensions; we might still be using it otherwise).

- 11 uses of .T,  all of which seem to be on 2-D arrays and are certainly
used as if they were matrix transpose (most are for fitting). Certainly,
all of these are bugs lying in waiting if the arrays ever get to be >2-D.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-25 Thread Marten van Kerkwijk
Hi Juan,

On Tue, Jun 25, 2019 at 9:35 AM Juan Nunez-Iglesias 
wrote:

> On Mon, 24 Jun 2019, at 11:25 PM, Marten van Kerkwijk wrote:
>
> Just to be sure: for a 1-d array, you'd both consider `.T` giving a shape
> of `(n, 1)` the right behaviour? I.e., it should still change from what it
> is now - which is to leave the shape at `(n,)`.
>
>
> Just to chime in as a user: v.T should continue to be a silent no-op for
> 1D arrays. NumPy makes it arbitrary whether a 1D array is viewed as a row
> or column vector, but we often want to write .T to match the notation in a
> paper we're implementing.
>
> More deeply, I think .T should never change the number of dimensions of an
> array.
>

OK, that makes three of you, all agreeing on the same core argument, but
with you now adding another strong one, of not changing the number of
dimensions. Let's consider this aspect settled.


>
> I'm ambivalent about the whole discussion in this thread, but generally I
> think NumPy should err on the side of caution when deprecating behaviour.
> It's unclear to me whether the benefits of making .T transpose only the
> last two dimensions outweigh the costs of deprecation. Despite some
> people's assertion that those using .T to transpose >2D arrays are probably
> making a mistake, we have two perfectly correct uses in scikit-image. These
> could be easily changed to .transpose() (honestly they probably should!),
> but they illustrate that there is some amount of correct code out there
> that would be forced to keep up with an update here.
>

Fair enough, there are people who actually read the manual and use things
correctly! Though, being generally one of those, I still was very
disappointed to find `.T` didn't do the last two axes.

Any preference for alternative spellings?

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-25 Thread Marten van Kerkwijk
Hi All,

The examples with different notation brought back memory of another
solution: define
`m.ᵀ` and m.ᴴ`. This is possible, since python3 allows any unicode for
names, nicely readable, but admittedly a bit annoying to enter (in emacs,
set-input-method to TeX and then ^T, ^H).

More seriously, still hoping to move to just being able to use .T and .H as
matrix (conjugate) transpose in newer numpy versions, is it really not
possible within a property to know whether the context where the operation
was defined has some specific "matrix_transpose" variable set? After all,
an error or warning generates a stack backtrace and from the ndarray C code
one would have to look only one stack level up (inside a warning, one can
even ask for the warning to be given from inside a different level, if I
recall correctly).

If that is truly impossible, then I think we need different names for both
.T and .H.  Some suggestions:

1. a.MT, a.MH (original suggestion at top of the thread)
2. a.mT, a.mH (m still for matrix, but no standing out as much, maybe
making it is easier to guess what is means)
3. a.RT and .CT (regular and conjugate transpose - the C also reminds of
complex)

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Marten van Kerkwijk
On Mon, Jun 24, 2019 at 7:21 PM Stephan Hoyer  wrote:

> On Mon, Jun 24, 2019 at 3:56 PM Allan Haldane 
> wrote:
>
>> I'm not at all set on that behavior and we can do something else. For
>> now, I chose this way since it seemed to best match the "IGNORE" mask
>> behavior.
>>
>> The behavior you described further above where the output row/col would
>> be masked corresponds better to "NA" (propagating) mask behavior, which
>> I am leaving for later implementation.
>
>
> This does seem like a clean way to *implement* things, but from a user
> perspective I'm not sure I would want separate classes for "IGNORE" vs "NA"
> masks.
>
> I tend to think of "IGNORE" vs "NA" as descriptions of particular
> operations rather than the data itself. There are a spectrum of ways to
> handle missing data, and the right way to propagating missing values is
> often highly context dependent. The right way to set this is in functions
> where operations are defined, not on classes that may be defined far away
> from where the computation happen. For example, pandas has a "min_count"
> parameter in functions for intermediate use-cases between "IGNORE" and "NA"
> semantics, e.g., "take an average, unless the number of data points is
> fewer than min_count."
>

Anything that specific like that is probably indeed outside of the purview
of a MaskedArray class.

But your general point is well taken: we really need to ask clearly what
the mask means not in terms of operations but conceptually.

Personally, I guess like Benjamin I have mostly thought of it as "data here
is bad" (because corrupted, etc.) or "data here is irrelevant" (because of
sea instead of land in a map). And I would like to proceed nevertheless
with calculating things on the remainder. For an expectation value (or,
less obviously, a minimum or maximum), this is mostly OK: just ignore the
masked elements. But even for something as simple as a sum, what is correct
is not obvious: if I ignore the count, I'm effectively assuming the
expectation is symmetric around zero (this is why `vector.dot(vector)`
fails); a better estimate would be `np.add.reduce(data, where=~mask) *
N(total) / N(unmasked)`.

Of course, the logical conclusion would be that this is not possible to do
without guidance from the user, or, thinking more, that really a masked
array is not at all what I want for this problem; really I am just using
(1-mask) as a weight, and the sum of the weights matters, so I should have
a WeightArray class where that is returned along with the sum of the data
(or, a bit less extreme, a `CountArray` class, or, more extreme, a value
and its uncertainty - heck, sounds a lot like my Variable class from 4
years ago, https://github.com/astropy/astropy/pull/3715, which even takes
care of covariance [following the Uncertainty package]).

OK, it seems I've definitely worked myself in a corner tonight where I'm
not sure any more what a masked array is good for in the first place...
I'll stop for the day!

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Marten van Kerkwijk
Hi Allan,

> The alternative solution in my model would be to replace `np.dot` with a
> > masked-specific implementation of what `np.dot` is supposed to stand for
> > (in your simple example, `np.add.reduce(np.multiply(m, m))` - more
> > generally, add relevant `outer` and `axes`). This would be similar to
> > what I think all implementations do for `.mean()` - we cannot calculate
> > that from the data using any fill value or skipping, so rather use a
> > more easily cared-for `.sum()` and divide by a suitable number of
> > elements. But in both examples the disadvantage is that we took away the
> > option to use the underlying class's `.dot()` or `.mean()`
> implementations.
>
> Just to note, my current implementation uses the IGNORE style of mask,
> so does not propagate the mask in np.dot:
>
> >>> a = MaskedArray([[1,1,1], [1,X,1], [1,1,1]])
> >>> np.dot(a, a)
>
> MaskedArray([[3, 2, 3],
>  [2, 2, 2],
>  [3, 2, 3]])
>
> I'm not at all set on that behavior and we can do something else. For
> now, I chose this way since it seemed to best match the "IGNORE" mask
> behavior.
>

It is a nice example, I think. In terms of action on the data, one would
get this result as well in my pseudo-representation of
`np.add.reduce(np.multiply(m, m))` - as long as the multiply is taken as
outer product along the relevant axes (which does not ignore the mask,
i.e., if either element is masked, the product is too), and subsequently a
sum which works like other reductions and skips masked elements.

>From the FFT array multiplication analogy, though, it is not clear that,
effectively, replacing masked elements by 0 is reasonable.

Equivalently, thinking of `np.dot` in its 1-D form as presenting the length
of the projection of one vector along another, it is not clear what a
single masked element is supposed to mean. In a way, masking just one
element of a vector or of a matrix makes vector or matrix operations
meaningless.

I thought fitting data with a mask might give a counterexample, but in that
one usually calculates at some point r = y - A x, so no masking of the
matrix, and subtraction y-Ax passing on a mask, and summing of r ignoring
masked elements does just the right thing.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Marten van Kerkwijk
Hi Eric,

The easiest definitely is for the mask to just propagate, which that even
if just one point is masked, all points in the fft will be masked.

On the direct point I made, I think it is correct that since one can think
of the Fourier transform of a sine/cosine fit, then there is a solution
even in the presence of some masked data, and this solution is distinct
from that for a specific choice of fill value. But of course it is also
true that the solution will be at least partially degenerate in its result
and possibly indeterminate (e.g., for the extreme example of a real
transform for which all but the first point are masked, all cosine term
amplitudes are equal to the value of the first term, and are completely
degenerate with each other, and all sine term amplitudes are indeterminate;
one has only one piece of information, after all). Yet the inverse of any
of those choices reproduces the input. That said, clearly there is a choice
to be made whether this solution is at all interesting, which means that
you are right that it needs an explicit user decision.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Marten van Kerkwijk
Hi Allan,

Thanks for bringing up the noclobber explicitly (and Stephan for asking for
clarification; I was similarly confused).

It does clarify the difference in mental picture. In mine, the operation
would indeed be guaranteed to be done on the underlying data, without copy
and without `.filled(...)`. I should clarify further that I use `where`
only to skip reading elements (i.e., in reductions), not writing elements
(as you mention, the unwritten element will often be nonsense - e.g., wrong
units - which to me is worse than infinity or something similar; I've not
worried at all about runtime warnings). Note that my main reason here is
not that I'm against filling with numbers for numerical arrays, but rather
wanting to make minimal assumptions about the underlying data itself. This
may well be a mistake (but I want to find out where it breaks).

Anyway, it would seem in many ways all the better that our models are quite
different. I definitely see the advantages of your choice to decide one can
do with masked data elements whatever is logical ahead of an operation!

Thanks also for bringing up a useful example with `np.dot(m, m)` - clearly,
I didn't yet get beyond overriding ufuncs!

In my mental model, where I'd apply `np.dot` on the data and the mask
separately, the result will be wrong, so the mask has to be set (which it
would be). For your specific example, that might not be the best solution,
but when using `np.dot(matrix_shaped, matrix_shaped)`, I think it does give
the correct masking: any masked element in a matrix better propagate to all
parts that it influences, even if there is a reduction of sorts happening.
So, perhaps a price to pay for a function that tries to do multiple things.

The alternative solution in my model would be to replace `np.dot` with a
masked-specific implementation of what `np.dot` is supposed to stand for
(in your simple example, `np.add.reduce(np.multiply(m, m))` - more
generally, add relevant `outer` and `axes`). This would be similar to what
I think all implementations do for `.mean()` - we cannot calculate that
from the data using any fill value or skipping, so rather use a more easily
cared-for `.sum()` and divide by a suitable number of elements. But in both
examples the disadvantage is that we took away the option to use the
underlying class's `.dot()` or `.mean()` implementations.

(Aside: considerations such as these underlie my longed-for exposure of
standard implementations of functions in terms of basic ufunc calls.)

Another example of a function for which I think my model is not
particularly insightful (and for which it is difficult to know what to do
generally) is `np.fft.fft`. Since an fft is equivalent to a sine/cosine
fits to data points, the answer for masked data is in principle quite
well-defined. But much less easy to implement!

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-24 Thread Marten van Kerkwijk
Hi Stephan,

Yes, the complex conjugate dtype would make things a lot faster, but I
don't quite see why we would wait for that with introducing the `.H`
property.

I do agree that `.H` is the correct name, giving most immediate clarity
(i.e., people who know what conjugate transpose is, will recognize it,
while likely having to look up `.CT`, while people who do not know will
have to look up regardless). But at the same time agree that the docstring
and other documentation should start with "Conjugate tranpose" - good to
try to avoid using names of people where you have to be in the "in crowd"
to know what it means.

The above said, if we were going with the initial suggestion of `.MT` for
matrix transpose, then I'd prefer `.CT` over `.HT` as its conjugate version.

But it seems there is little interest in that suggestion, although sadly a
clear path forward has not yet emerged either.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-23 Thread Marten van Kerkwijk
Hi Eric,

On your other points:

I remain unconvinced that Mask classes should behave differently on
> different ufuncs. I don’t think np.minimum(ignore_na, b) is any different
> to np.add(ignore_na, b) - either both should produce b, or both should
> produce ignore_na. I would lean towards produxing ignore_na, and
> propagation behavior differing between “ignore” and “invalid” only for
> reduce / accumulate operations, where the concept of skipping an
> application is well-defined.
>
I think I mostly agree - this is really about reductions. And this fact
that there are apparently only two choices weakens the case for pushing the
logic into the mask class itself.

But the one case that still tempts me to break with the strict rule for
ufunc.__call__ is `fmin, fmax` vs `minimum, maximum`... What do you think?


> Some possible follow-up questions that having two distinct masked types
> raise:
>
>- what if I want my data to support both invalid and skip fields at
>the same time? sum([invalid, skip, 1]) == invalid
>
> Have a triple-valued mask? Would be easy to implement if all the logic is
in the mask...

(Indeed, for all I care it could implement weighting! That would actually
care about the actual operation, so would be a real example. Though of
course it also does need access to the actual data, so perhaps best not to
go there...)

>
>- is there a use case for more that these two types of mask?
>invalid_due_to_reason_A, invalid_due_to_reason_B would be interesting
>things to track through a calculation, possibly a dictionary of named 
> masks.
>
> For astropy's NDData, there has been quite a bit of discussion of a
`Flags` object, which works exactly as you describe, an OR together of
different reasons for why data is invalid (HST uses this, though the
discussion was for the Large Synodic Survey Telescope data pipeline). Those
flags are propagated like masks.

I think in most cases, these examples would not require more than allowing
the mask to be a duck type. Though perhaps for some reductions, it might
matter what the reduction of the data is actually doing (e.g.,
`np.minimum.reduce` might need different rules than `np.add.reduce`). And,
of course, one can argue that for such a case it might be best to subclass
MaskedArray itself, and do the logic there.

All the best,

Marten

p.s. For accumulations, I'm still not sure I find them well-defined. I
could see that np.add.reduce([0, 1, 1, --, 3])` could lead to `[0, 1, 2,
5]`, i.e., a shorter sequence, but this doesn't work on arrays where
different rows can have different numbers of masked elements. It then
perhaps suggests `[0, 1, 2, --, 5]` is OK, but the annoyance I have is that
there is nothing that tells me what the underlying data should be, i.e.,
this is truly different from having a `where` keyword in `np.add.reduce`.
But perhaps it is that I do not see much use for accumulation beyond
changing a histogram into its cumulative version - for which masked
elements really make no sense - one somehow has to interpolate over, not
just discount the masked one.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-23 Thread Marten van Kerkwijk
Hi Stephan,

Eric perhaps explained my concept better than I could!

I do agree that, as written, your example would be clearer, but Allan's
code and the current MaskedArray code do have not that much semblance to
it, and mine even less, as they deal with operators as whole groups.

For mine, it may be useful to tell that this quite possibly crazy idea came
from considering how one would quite logically implement all shape
operations, which effect the data and mask the same way, so one soon writes
down something where (as in my ShapedLikeNDArray mixin;
https://github.com/astropy/astropy/blob/master/astropy/utils/misc.py#L858)
all methods pass on to `return self._apply(method, ), with
`_apply` looking like:
```
def _apply(self, method, *args, **kwargs):
# For use with NDArrayShapeMethods.
data = getattr(self._data, method)(*args, **kwargs)
mask = getattr(self._mask, method)(*args, **kwargs)

return self.__class__(data, mask=mask)
```
(Or the same is auto-generated for all those methods.)

Now considering this, for `__array_ufunc__`, one might similarly do
(__call__ only, ignoring outputs)
```
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
data = []
masks = []
for input_ in inputs:
d, m = self._data_mask(input_)
data.append(d)
masks.append(m)
result_mask = getattr(ufunc, method)(*masks, **mask_kwargs)
if result_mask is NotImplemented:
return NotImplemented

result_data = getattr(ufunc, method)(*data, **kwargs)
return self._masked_result(result_data, result_mask, out)
```
The beauty here is that the Masked class itself needs to know very little
about how to do masking, or how to implement it; it is almost like a
collection of arrays: the only thing that separates `__array_ufunc__` from
`_apply` above is that, since other objects are involved, the data and mask
have to be separated out for those. (And perhaps for people trying to
change it, it actually helps that the special-casing of single-, double-,
and triple-input ufuncs can all be done inside the mask class, and
adjustments can be made there without having to understand the machinery
for getting masks out of data, etc.?)

But what brought me to this was not __array_ufunc__ itself, but rather the
next step, that I really would like to have masked version of classes that
are array-like in shape but do not generally work with numpy ufuncs or
functions, and where I prefer not to force the use of numpy ufuncs. So,
inside the masked class I have to override the operators. Obviously, there
again I could do it with functions like you describe, but I can also just
split off data and masks as above, and just call the same operator on both.
Then, the code could basically use that of `__array_ufunc__` above when
called as `self.__array_ufunc__(operator, '__add__', self, other)`.

I think (but am not sure, not actually having tried it yet), that this
would also make it easier to mostly auto-generated masked classes: those
supported by both the mask and the data are relatively easy; those separate
for the data need some hints from the data class (e.g., `.unit` on a
`Quantity` is independent of the mask).

Anyway, just to be clear, I am *not* sure that this is the right way
forward. I think the more important really was your suggestion to take mask
duck types into account a priori (for bit-packed ones, etc.). But it seems
worth thinking it through now, so we don't repeat the things that ended up
making `MaskedArray` quite annoying.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-23 Thread Marten van Kerkwijk
I had not looked at any implementation (only remembered the nice idea of
"importing from the future"), and looking at the links Eric shared, it
seems that the only way this would work is, effectively, pre-compilation
doing a `.replace('.T', '._T_from_the_future')`, where you'd be
hoping that there never is any other meaning for a `.T` attribute, for any
class, since it is impossible to be sure a given variable is an ndarray.
(Actually, a lot less implausible than for the case of numpy indexing
discussed in the link...)

Anyway, what I had in mind was something along the lines of inside the `.T`
code there being be a check on whether a particular future item was present
in the environment. But thinking more, I can see that it is not trivial to
get to know something about the environment in which the code that called
you was written

So, it seems there is no (simple) way to tell numpy that inside a given
module you want `.T` to have the new behaviour, but still to warn if
outside the module it is used in the old way (when risky)?

-- Marten

p.s. I'm somewhat loath to add new properties to ndarray, but `.T` and `.H`
have such obvious and clear meaning to anyone dealing with (complex)
matrices that I think it is worth it. See
https://mail.python.org/pipermail/numpy-discussion/2019-June/079584.html
for a list of options of attributes that we might deprecate "in exchange"...

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-23 Thread Marten van Kerkwijk
Hi Stephan,

In slightly changed order:

Let me try to make the API issue more concrete. Suppose we have a
> MaskedArray with values [1, 2, NA]. How do I get:
> 1. The sum ignoring masked values, i.e., 3.
> 2. The sum that is tainted by masked values, i.e., NA.
>
> Here's how this works with existing array libraries:
> - With base NumPy using NaN as a sentinel value for NA, you can get (1)
> with np.sum and (2) with np.nansum.
> - With pandas and xarray, the default behavior is (1) and to get (2) you
> need to write array.sum(skipna=False).
> - With NumPy's current MaskedArray, it appears that you can only get (1).
> Maybe there isn't as strong a need for (2) as I thought?
>

I think this is all correct.

>
> Your proposal would be something like np.sum(array,
> where=np.ones_like(array))? This seems rather verbose for a common
> operation. Perhaps np.sum(array, where=True) would work, making use of
> broadcasting? (I haven't actually checked whether this is well-defined yet.)
>
> I think we'd need to consider separately the operation on the mask and on
the data. In my proposal, the data would always do `np.sum(array,
where=~mask)`, while how the mask would propagate might depend on the mask
itself, i.e., we'd have different mask types for `skipna=True` (default)
and `False` ("contagious") reductions, which differed in doing
`logical_and.reduce` or `logical_or.reduce` on the mask.

I have been playing with using a new `Mask(np.ndarray)` class for the mask,
>> which does the actual mask propagation (i.e., all single-operand ufuncs
>> just copy the mask, binary operations do `logical_or` and reductions do
>> `logical.and.reduce`). This way the `Masked` class itself can generally
>> apply a given operation on the data and the masks separately and then
>> combine the two results (reductions are the exception in that `where` has
>> to be set). Your particular example here could be solved with a different
>> `Mask` class, for which reductions do `logical.or.reduce`.
>>
>
> I think it would be much better to use duck-typing for the mask as well,
> if possible, rather than a NumPy array subclass. This would facilitate
> using alternative mask implementations, e.g., distributed masks, sparse
> masks, bit-array masks, etc.
>

Implicitly in the above, I agree with having the mask not necessarily be a
plain ndarray, but something that can determine part of the action. Makes
sense to generalize that to duck arrays for the reasons you give. Indeed,
if we let the mask do the mask propagation as well, it might help make the
implementation substantially easier (e.g., `logical_and.reduce` and
`logical_or.reduce` can be super-fast on a bitmask!).


> Are there use-cases for propagating masks separately from data? If not, it
> might make sense to only define mask operations along with data, which
> could be much simpler.
>

I had only thought about separating out the concern of mask propagation
from the "MaskedArray" class to the mask proper, but it might indeed make
things easier if the mask also did any required preparation for passing
things on to the data (such as adjusting the "where" argument in a
reduction). I also like that this way the mask can determine even before
the data what functionality is available (i.e., it could be the place from
which to return `NotImplemented` for a ufunc.at call with a masked index
argument).

It may be good to collect a few more test cases... E.g., I'd like to mask
some of the astropy classes that are only very partial duck arrays, in that
they cover only the shape aspect, and which do have some operators and for
which it would be nice not to feel forced to use __array_ufunc__.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-23 Thread Marten van Kerkwijk
Hi Tom,


I think a sensible alternative mental model for the MaskedArray class is
>> that all it does is forward any operations to the data it holds and
>> separately propagate a mask,
>>
>
> I'm generally on-board with that mental picture, and agree that the
> use-case described by Ben (different layers of satellite imagery) is
> important.  Same thing happens in astronomy data, e.g. you have a CCD image
> of the sky and there are cosmic rays that contaminate the image.  Those are
> not garbage data, just pixels that one wants to ignore in some, but not
> all, contexts.
>
> However, it's worth noting that one cannot blindly forward any operations
> to the data it holds since the operation may be illegal on that data.  The
> simplest example is dividing `a / b` where  `b` has data values of 0 but
> they are masked.  That operation should succeed with no exception, and here
> the resultant value under the mask is genuinely garbage.
>

Even in the present implementation, the operation is just forwarded, with
numpy errstate set to ignore all errors. And then after the fact some
half-hearted remediation is done.


> The current MaskedArray seems a bit inconsistent in dealing with invalid
> calcuations.  Dividing by 0 (if masked) is no problem and returns the
> numerator.  Taking the log of a masked 0 gives the usual divide by zero
> RuntimeWarning and puts a 1.0 under the mask of the output.
>
> Perhaps the expression should not even be evaluated on elements where the
> output mask is True, and all the masked output data values should be set to
> a predictable value (e.g. zero for numerical, zero-length string for
> string, or maybe a default fill value).  That at least provides consistent
> and predictable behavior that is simple to explain.  Otherwise the story is
> that the data under the mask *might* be OK, unless for a particular element
> the computation was invalid in which case it is filled with some arbitrary
> value.  I think that is actually an error-prone behavior that should be
> avoided.
>

I think I agree with Allan here, that after a computation, one generally
simply cannot safely assume anything for masked elements.

But it is reasonable for subclasses to define what they want to do
"post-operation"; e.g., for numerical arrays, it might make generally make
sense to do
```
notok = ~np.isfinite(result)
mask |= notok
```
and one could then also do
```
result[notok] = fill_value
```

But I think one might want to leave that to the user.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-23 Thread Marten van Kerkwijk
> I think a sensible alternative mental model for the MaskedArray class is
>> that all it does is forward any operations to the data it holds and
>> separately propagate a mask, ORing elements together for binary operations,
>> etc., and explicitly skipping masked elements in reductions (ideally using
>> `where` to be as agnostic as possible about the underlying data, for which,
>> e.g., setting masked values to `0` for `np.reduce.add` may or may not be
>> the right thing to do - what if they are string?).
>>
>
> +1, this sounds like the right model to me.
>

One small worry is about name clashes - ideally one wants the masked array
to be somewhat of a drop-in for whatever class it is masking (independently
of whether it is an actual subclass of it). In this respect, `.data` is a
pretty terrible name (and, yes, the cause of lots of problems for astropy's
MaskedColumn - not my fault, that one!). In my own trials, thinking that
names that include "mask" are fair game, I've been considering a function
`.unmask(fill_value=None)` which would replace both `.filled(fill_value)`
and `.data` by having the default be not to fill anything (I don't see why
a masked array should carry a fill value along; one might use specific
strings such as 'minmax' for auto-generated cases). If wanted, one could
then add `unmasked = property(unmask)`.

Aside: my sense is to, at first at least, feel as unbound as possible from
the current MaskedArray - one could then use whatever it is to try to
create something that comes close to reproducing it, but only for ease of
transition.


> That said, I would still not guarantee values under the mask as part of
> NumPy's API. The result of computations under the mask should be considered
> an undefined implementation detail, sort of like integer overflow or dict
> iteration order pre-Python 3.7. The values may even be entirely arbitrary,
> e.g., in cases where the result is preallocated with empty().
>

I think that is reasonable. The use cases Ben and I described both are ones
where the array is being used as input for a set of computations which
differ only in their mask. (Admittedly, in both our cases one could just
reinitialize a masked array with the new mask; but I think we share the
mental model of that if I don't operate on the masked array, the data
doesn't change, so I should just be able to change the mask.)


> I'm less confident about the right way to handle missing elements in
> reductions. For example:
> - Should median() also skip missing elements, even though there is no
> identity element?
>

I think so. If for mean(), std(), etc., the number of unmasked elements
comes into play, I don't see why it wouldn't for median().

- If reductions/aggregations default to skipping missing elements, how is
> it be possible to express "NA propagating" versions, which are also useful,
> if slightly less common?
>

I have been playing with using a new `Mask(np.ndarray)` class for the mask,
which does the actual mask propagation (i.e., all single-operand ufuncs
just copy the mask, binary operations do `logical_or` and reductions do
`logical.and.reduce`). This way the `Masked` class itself can generally
apply a given operation on the data and the masks separately and then
combine the two results (reductions are the exception in that `where` has
to be set). Your particular example here could be solved with a different
`Mask` class, for which reductions do `logical.or.reduce`.

A larger issue is the accumulations. Personally, I think those are
basically meaningless for masked arrays, as to me logically the result on
the position of any masked item should be masked. But, if so, I do not see
how the ones "beyond" it could not be masked as well. Since here the right
answers seems at least unclear, my sense would be to refuse the temptation
to guess (i.e., the user should just explicitly fill with ufunc.identity if
this is the right thing to do in their case).

I should add that I'm slightly torn about a similar, somewhat related
issue: what should `np.minimum(a, b)` do for the case where either a or b
is masked? Currently, one just treats this as a bin-op, so the result is
masked, but one could argue that this ufunc is a bit like a 2-element
reduction, and thus that the unmasked item should "win by default".
Possibly, the answer should be different between `np.minimum` and `np.fmin`
(since the two differ in how they propagate `NaN` as well - note that you
don't include `fmin` and `fmax` in your coverage).

We may want to add a standard "skipna" argument on NumPy aggregations,
> solely for the benefit of duck arrays (and dtypes with missing values). But
> that could also be a source of confusion, especially if skipna=True refers
> only "true NA" values, not including NaN, which is used as an alias for NA
> in pandas and elsewhere.
>

It does seem `where` should suffice, no? If one wants to be super-fancy, we
could allow it to be a callable, which, if a ufunc, gets used inside the
loop 

Re: [Numpy-discussion] new MaskedArray class

2019-06-22 Thread Marten van Kerkwijk
Hi Allan,

I'm not sure I would go too much by what the old MaskedArray class did. It
indeed made an effort not to overwrite masked values with a new result,
even to the extend of copying back masked input data elements to the output
data array after an operation. But the fact that this is non-sensical if
the dtype changes (or the units in an operation on quantities) suggests
that this mental model simply does not work.

I think a sensible alternative mental model for the MaskedArray class is
that all it does is forward any operations to the data it holds and
separately propagate a mask, ORing elements together for binary operations,
etc., and explicitly skipping masked elements in reductions (ideally using
`where` to be as agnostic as possible about the underlying data, for which,
e.g., setting masked values to `0` for `np.reduce.add` may or may not be
the right thing to do - what if they are string?).

With this mental picture, the underlying data are always have well-defined
meaning: they have been operated on as if the mask did not exist. There
then is also less reason to try to avoid getting it back to the user.

As a concrete example (maybe Ben has others): in astropy we have a
sigma-clipping average routine, which uses a `MaskedArray` to iteratively
mask items that are too far off from the mean; here, the mask varies each
iteration (an initially masked element can come back into play), but the
data do not.

All the best,

Marten

On Sat, Jun 22, 2019 at 10:54 AM Allan Haldane 
wrote:

> On 6/21/19 2:37 PM, Benjamin Root wrote:
> > Just to note, data that is masked isn't always garbage. There are plenty
> > of use-cases where one may want to temporarily apply a mask for a set of
> > computation, or possibly want to apply a series of different masks to
> > the data. I haven't read through this discussion deeply enough, but is
> > this new class going to destroy underlying masked data? and will it be
> > possible to swap out masks?
> >
> > Cheers!
> > Ben Root
>
> Indeed my implementation currently feels free to clobber the data at
> masked positions and makes no guarantees not to.
>
> I'd like to try to support reasonable use-cases like yours though. A few
> thoughts:
>
> First, the old np.ma.MaskedArray explicitly does not promise to preserve
> masked values, with a big warning in the docs. I can't recall the
> examples, but I remember coming across cases where clobbering happens.
> So arguably your behavior was never supported, and perhaps this means
> that no-clobber behavior is difficult to reasonably support.
>
> Second, the old np.ma.MaskedArray avoids frequent clobbering by making
> lots of copies. Therefore, in most cases you will not lose any
> performance in my new MaskedArray relative to the old one by making an
> explicit copy yourself. I.e, is it problematic to have to do
>
>  >>> result = MaskedArray(data.copy(), trial_mask).sum()
>
> instead of
>
>  >>> marr.mask = trial_mask
>  >>> result = marr.sum()
>
> since they have similar performance?
>
> Third, in the old np.ma.MaskedArray masked positions are very often
> "effectively" clobbered, in the sense that they are not computed. For
> example, if you do "c = a+b", and then change the mask of c, the values
> at masked position of the result of (a+b) do not correspond to the sum
> of the masked values in a and b. Thus, by "unmasking" c you are exposing
> nonsense values, which to me seems likely to cause heisenbugs.
>
>
> In summary, by not making no-clobber guarantees and by strictly
> preventing exposure of nonsense values, I suspect that: 1. my new code
> is simpler and faster by avoiding lots of copies, and forces copies to
> be explicit in user code. 2. disallowing direct modification of the mask
> lowers the "API surface area" making people's MaskedArray code less
> buggy and easier to read: Exposure of nonsense values by "unmasking" is
> one less possibility to keep in mind.
>
> Best,
> Allan
>
>
> > On Thu, Jun 20, 2019 at 12:44 PM Allan Haldane  > <mailto:allanhald...@gmail.com>> wrote:
> >
> > On 6/19/19 10:19 PM, Marten van Kerkwijk wrote:
> > > Hi Allan,
> > >
> > > This is very impressive! I could get the tests that I wrote for my
> > class
> > > pass with yours using Quantity with what I would consider very
> minimal
> > > changes. I only could not find a good way to unmask data (I like
> the
> > > idea of setting the mask on some elements via `ma[item] = X`); is
> this
> > > on purpose?
> >
> > Yes, I want to make it difficult for the user to access the garbage
> > val

Re: [Numpy-discussion] new MaskedArray class

2019-06-19 Thread Marten van Kerkwijk
Hi Allan,

This is very impressive! I could get the tests that I wrote for my class
pass with yours using Quantity with what I would consider very minimal
changes. I only could not find a good way to unmask data (I like the idea
of setting the mask on some elements via `ma[item] = X`); is this on
purpose?

Anyway, it would seem easily at the point where I should comment on your
repository rather than in the mailing list!

All the best,

Marten


On Wed, Jun 19, 2019 at 5:45 PM Allan Haldane 
wrote:

> On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
> >
> >
> > On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane  > <mailto:allanhald...@gmail.com>> wrote:
> > 
> >
> > > This may be too much to ask from the initializer, but, if so, it
> still
> > > seems most useful if it is made as easy as possible to do, say,
> `class
> > > MaskedQuantity(Masked, Quantity): `.
> >
> > Currently MaskedArray does not accept ducktypes as underlying arrays,
> > but I think it shouldn't be too hard to modify it to do so. Good
> idea!
> >
> >
> > Looking back at my trial, I see that I also never got to duck arrays -
> > only ndarray subclasses - though I tried to make the code as agnostic as
> > possible.
> >
> > (Trial at
> >
> https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1
> )
> >
> > I already partly navigated this mixin-issue in the
> > "MaskedArrayCollection" class, which essentially does
> > ArrayCollection(MaskedArray(array)), and only takes about 30 lines of
> > boilerplate. That's the backwards encapsulation order from what you
> want
> > though.
> >
> >
> > Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) * u.m,
> > mask=[True, False, False])` does indeed not have a `.unit` attribute
> > (and cannot represent itself...); I'm not at all sure that my method of
> > just creating a mixed class is anything but a recipe for disaster,
> though!
>
> Based on your suggestion I worked on this a little today, and now my
> MaskedArray more easily encapsulates both ducktypes and ndarray
> subclasses (pushed to repo). Here's an example I got working with masked
> units using unyt:
>
> [1]: from MaskedArray import X, MaskedArray, MaskedScalar
>
> [2]: from unyt import m, km
>
> [3]: import numpy as np
>
> [4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])
>
> [5]: uarr
>
> MaskedArray([1., X , 3.])
> [6]: uarr + 1*m
>
> MaskedArray([1.001, X, 3.001])
> [7]: uarr.filled()
>
> unyt_array([1., 0., 3.], 'km')
> [8]: np.concatenate([uarr, 2*uarr]).filled()
> unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')
>
> The catch is the ducktype/subclass has to rigorously follow numpy's
> indexing rules, including distinguishing 0d arrays from scalars. For now
> only I used unyt in the example above since it happens to be less strict
>  about dimensionless operations than astropy.units which trips up my
> repr code. (see below for example with astropy.units). Note in the last
> line I lost the dimensions, but that is because unyt does not handle
> np.concatenate. To get that to work we need a true ducktype for units.
>
> The example above doesn't expose the ".units" attribute outside the
> MaskedArray, and it doesn't print the units in the repr. But you can
> access them using "filled".
>
> While I could make MaskedArray forward unknown attribute accesses to the
> encapsulated array, that seems a bit dangerous/bug-prone at first
> glance, so probably I want to require the user to make a MaskedArray
> subclass to do so. I've just started playing with that (probably buggy),
> and Ive attached subclass examples for astropy.unit and unyt, with some
> example output below.
>
> Cheers,
> Allan
>
>
>
> Example using the attached astropy unit subclass:
>
> >>> from astropy.units import m, km, s
> >>> uarr = MaskedQ(np.ones(3), units=km, mask=[0,1,0])
> >>> uarr
> MaskedQ([1., X , 1.], units=km)
> >>> uarr.units
> km
> >>> uarr + (1*m)
> MaskedQ([1.001, X, 1.001], units=km)
> >>> uarr/(1*s)
> MaskedQ([1., X , 1.], units=km / s)
> >>> (uarr*(1*m))[1:]
> MaskedQ([X , 1.], units=km m)
> >>> np.add.outer(uarr, uarr)
> MaskedQ([[2., X , 2.],
>  [X , X , X ],
>  [2., X , 2.]], units=km)
> >>> print(uarr)
> [1. X  1.] km m
>
> Cheers,
> Allan
>
>
> > > Even if this impossible, I think it is conceptually use

Re: [Numpy-discussion] new MaskedArray class

2019-06-18 Thread Marten van Kerkwijk
On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane 
wrote:


> > This may be too much to ask from the initializer, but, if so, it still
> > seems most useful if it is made as easy as possible to do, say, `class
> > MaskedQuantity(Masked, Quantity): `.
>
> Currently MaskedArray does not accept ducktypes as underlying arrays,
> but I think it shouldn't be too hard to modify it to do so. Good idea!
>

Looking back at my trial, I see that I also never got to duck arrays - only
ndarray subclasses - though I tried to make the code as agnostic as
possible.

(Trial at
https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1
)

I already partly navigated this mixin-issue in the
> "MaskedArrayCollection" class, which essentially does
> ArrayCollection(MaskedArray(array)), and only takes about 30 lines of
> boilerplate. That's the backwards encapsulation order from what you want
> though.
>

Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) * u.m,
mask=[True, False, False])` does indeed not have a `.unit` attribute (and
cannot represent itself...); I'm not at all sure that my method of just
creating a mixed class is anything but a recipe for disaster, though!


> > Even if this impossible, I think it is conceptually useful to think
> > about what the masking class should do. My sense is that, e.g., it
> > should not attempt to decide when an operation succeeds or not, but just
> > "or together" input masks for regular, multiple-input functions, and let
> > the underlying arrays skip elements for reductions by using `where`
> > (hey, I did implement that for a reason... ;-). In particular, it
> > suggests one should not have things like domains and all that (I never
> > understood why `MaskedArray` did that). If one wants more, the class
> > should provide a method that updates the mask (a sensible default might
> > be `mask |= ~np.isfinite(result)` - here, the class being masked should
> > logically support ufuncs and functions, so it can decide what "isfinite"
> > means).
>
> I agree it would be nice to remove domains. It would make life easier,
> and I could remove a lot of twiddly code! I kept it in for now to
> minimize the behavior changes from the old MaskedArray.
>

That makes sense. Could be separated out to a backwards-compatibility class
later.


> > In any case, I would think that a basic truth should be that everything
> > has a mask with a shape consistent with the data, so
> > 1. Each complex numbers has just one mask, and setting `a.imag` with a
> > masked array should definitely propagate the mask.
> > 2. For a masked array with structured dtype, I'd similarly say that the
> > default is for a mask to have the same shape as the array. But that
> > something like your collection makes sense for the case where one wants
> > to mask items in a structure.
>
> Agreed that we should have a single bool per complex or structured
> element, and the mask shape is the same as the array shape. That's how I
> implemented it. But there is still a problem with complex.imag assignment:
>
> >>> a = MaskedArray([1j, 2, X])
> >>> i = a.imag
> >>> i[:] = MaskedArray([1, X, 1])
>
> If we make the last line copy the mask to the original array, what
> should the real part of a[2] be? Conversely, if we don't copy the mask,
> what should the imag part of a[1] be? It seems like we might "want" the
> masks to be OR'd instead, but then should i[2] be masked after we just
> set it to 1?
>
> Ah, I see the issue now... Easiest to implement and closest in analogy to
a regular view would be to just let it unmask a[2] (with whatever is in
real; user beware!).

Perhaps better would be to special-case such that `imag` returns a
read-only view of the mask. Making `imag` itself read-only would prevent
possibly reasonable things like `i[np.isclose(i, 0)] = 0` - but there is no
reason this should update the mask.

Still, neither is really satisfactory...


>
> > p.s. I started trying to implement the above "Mixin" class; will try to
> > clean that up a bit so that at least it uses `where` and push it up.
>
> I played with "where", but didn't include it since 1.17 is not released.
> To avoid duplication of effort, I've attached a diff of what I tried. I
> actually get a slight slowdown of about 10% by using where...
>

Your implementation is indeed quite similar to what I got in
__array_ufunc__ (though one should "&" the where with ~mask).

I think the main benefit is not to presume that whatever is underneath
understands 0 or 1, i.e., avoid filling.


> If you make progress with the mixin, a push is welcome. I imagine a
> problem is going to be that np.isscalar doesn't work to detect duck
> scalars.
>
> I fear that in my attempts I've simply decided that only array scalars
exist...

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-18 Thread Marten van Kerkwijk
Hi Allen,

Thanks for the message and link! In astropy, we've been struggling with
masking a lot, and one of the main conclusions I have reached is that
ideally one has a more abstract `Masked` class that can take any type of
data (including `ndarray`, of course), and behaves like that data as much
as possible, to the extent that if, e.g., I create a `Masked(Quantity(...,
unit), mask=...)`, the instance will have a `.unit` attribute and perhaps
even `isinstance(..., Quantity)` will hold. And similarly for
`Masked(Time(...), mask=...)`, `Masked(SkyCoord(...), mask=...)`, etc. In a
way, `Masked` would be a kind of Mixin-class that just tracks a mask
attribute.

This may be too much to ask from the initializer, but, if so, it still
seems most useful if it is made as easy as possible to do, say, `class
MaskedQuantity(Masked, Quantity): `.

Even if this impossible, I think it is conceptually useful to think about
what the masking class should do. My sense is that, e.g., it should not
attempt to decide when an operation succeeds or not, but just "or together"
input masks for regular, multiple-input functions, and let the underlying
arrays skip elements for reductions by using `where` (hey, I did implement
that for a reason... ;-). In particular, it suggests one should not have
things like domains and all that (I never understood why `MaskedArray` did
that). If one wants more, the class should provide a method that updates
the mask (a sensible default might be `mask |= ~np.isfinite(result)` -
here, the class being masked should logically support ufuncs and functions,
so it can decide what "isfinite" means).

In any case, I would think that a basic truth should be that everything has
a mask with a shape consistent with the data, so
1. Each complex numbers has just one mask, and setting `a.imag` with a
masked array should definitely propagate the mask.
2. For a masked array with structured dtype, I'd similarly say that the
default is for a mask to have the same shape as the array. But that
something like your collection makes sense for the case where one wants to
mask items in a structure.

All the best,

Marten

p.s. I started trying to implement the above "Mixin" class; will try to
clean that up a bit so that at least it uses `where` and push it up.



On Mon, Jun 17, 2019 at 6:43 PM Allan Haldane 
wrote:

> Hi all,
>
> Chuck suggested we think about a MaskedArray replacement for 1.18.
>
> A few months ago I did some work on a MaskedArray replacement using
> `__array_function__`, which I got mostly working. It seems like a good
> time to bring it up for discussion now. See it at:
>
> https://github.com/ahaldane/ndarray_ducktypes
>
> It should be very usable, it has docs you can read, and it passes a
> pytest-suite with 1024 tests adapted from numpy's MaskedArray tests.
> What is missing? It needs even more tests for new functionality, and a
> couple numpy-API functions are missing, in particular `np.median`,
> `np.percentile`, `np.cov`, and `np.corrcoef`. I'm sure other devs could
> also find many things to improve too.
>
> Besides fixing many little annoyances from MaskedArray, and simplifying
> the logic by always storing the mask in full, it also has new features.
> For instance it allows the use of a "X" variable to mark masked
> locations during array construction, and I solve the issue of how to
> mask individual fields of a structured array differently.
>
> At this point I would by happy to get some feedback on the design and
> what seems good or bad. If it seems like a good start, I'd be happy to
> move it into a numpy repo of some sort for further collaboration &
> discussion, and maybe into 1.18. At the least I hope it can serve as a
> design study of what we could do.
>
>
>
>
>
> Let me also drop here two more interesting detailed issues:
>
> First, the issue of what to do about .real and .imag of complex arrays,
> and similarly about field-assignment of structured arrays. The problem
> is that we have a single mask bool per element of a complex array, but
> what if someone does `arr.imag = MaskedArray([1,X,1])`? How should the
> mask of the original array change? Should we make .real and .imag readonly?
>
> Second, a more general issue of how to ducktype scalars when using
> `__array_function__` which I think many ducktype implementors will have
> to face. For MaskedArray, I created an associated "MaskedScalar" type.
> However, MaskedScalar has to behave differently from normal numpy
> scalars in a number of ways: It is not part of the numpy scalar
> hierarchy, it fails checks `isinstance(var, np.floating)`, and
> np.isscalar returns false. Numpy scalar types cannot be subclassed. We
> have discussed before the need to have distinction between 0d-arrays and
> scalars, so we shouldn't just use a 0d (in fact, this makes printing
> very difficult). This leads me to think that in future dtype-overhaul
> plans, we should consider creating a subclassable `np.scalar` base type
> to wrap all numpy scalar 

Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-14 Thread Marten van Kerkwijk
Hi Ralf,

Thanks for the clarification. I think in your terms the bottom line was
that I thought we had a design B for the case where a function was really
"just a ufunc". But the nanfunctions show that even if logically they are a
ufunc (which admittedly uses another ufunc or two for `where`), it is
tricky, certainly trickier than I thought. And this discussion has served
to clarify that even for other "simpler" functions it can get similarly
tricky.

Anyway, bottom line is that I think you are right in actually needed a more
proper discussion/design about how to move forward.

All the best,

Marten

p.s. And, yes, `__array_function__` is quite wonderful!

On Fri, Jun 14, 2019 at 3:46 AM Ralf Gommers  wrote:

>
>
> On Fri, Jun 14, 2019 at 2:21 AM Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>> Hi Ralf,
>>
>> Thanks both for the reply and sharing the link. I recognize much (from
>> both sides!).
>>
>> 
>>
>>>
>>> More importantly, I think we should not even consider *discussing*
>>> removing` __array_function__` from np.isposinf (or any similar one off
>>> situation) before there's a new bigger picture design. This is not about
>>> "deprecation is hard", this is about doing things with purpose rather than
>>> ad-hoc, as well as recognizing that lots of small changes are a major drain
>>> on our limited maintainer resources. About the latter issue I wrote a blog
>>> post recently, perhaps that clarifies the previous sentence a bit and gives
>>> some more insight in my perspective:
>>> https://rgommers.github.io/2019/06/the-cost-of-an-open-source-contribution/
>>>
>>> Yes, I definitely did not mean to imply that a goal was to change just
>> `isposinf`, `sum`, or `nansum` (the goal of the PR that started this thread
>> was to clean up the whole `nanfunctions` module). Rather, to use them as
>> examples to see what policy there actually is or should be; and I do worry
>> that with __array_function__, however happy I am that it now exists
>> (finally, Quantity can be concatenated!), we're going the Microsoft route
>> of just layering on top of old code if even for the simplest cases there is
>> no obvious path for how to remove it.
>>
>
> I share that worry to some extent (an ever-growing collection of
> protocols). To be fair, we knew that __array_function__ wasn't perfect, but
> I think Stephan did a really good job making the case for why it was
> necessary, and that we needed to get something in place in 6-12 months
> rather than spending years on a more polished/comprehensive design. Given
> those constraints, we seem to have made a good choice that has largely
> achieved its goals.
>
> I'm still not sure you got my main objection. So I'll try once more.
> We now have a design, imperfect but it exists and is documented (to some
> extent). Let's call it design A. Now you're wanting clarity on a policy on
> how to move from design A to design B. However, what B even is isn't
> spelled out, although we can derive rough outlines from mailing list
> threads like these (it involves better handling of subclasses, allowing
> reuse of implementation of numpy functions in terms of other numpy
> functions, etc.). The right way forward is:
>
> 1. describe what design B is
> 2. decide if that design is a good idea
> 3. then worry about implementation and a migration policy
>
> Something that's specific to all nan-functions is still way too specific,
> and doesn't justify skipping 1-2 and jumping straight to 3.
>
> I don't know how to express it any clearer than the above in email format.
> If it doesn't make sense to you, it'd be great to have a call to discuss in
> person.
>
> Cheers,
> Ralf
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-13 Thread Marten van Kerkwijk
Hi Ralf,

Thanks both for the reply and sharing the link. I recognize much (from both
sides!).



>
> More importantly, I think we should not even consider *discussing*
> removing` __array_function__` from np.isposinf (or any similar one off
> situation) before there's a new bigger picture design. This is not about
> "deprecation is hard", this is about doing things with purpose rather than
> ad-hoc, as well as recognizing that lots of small changes are a major drain
> on our limited maintainer resources. About the latter issue I wrote a blog
> post recently, perhaps that clarifies the previous sentence a bit and gives
> some more insight in my perspective:
> https://rgommers.github.io/2019/06/the-cost-of-an-open-source-contribution/
>
> Yes, I definitely did not mean to imply that a goal was to change just
`isposinf`, `sum`, or `nansum` (the goal of the PR that started this thread
was to clean up the whole `nanfunctions` module). Rather, to use them as
examples to see what policy there actually is or should be; and I do worry
that with __array_function__, however happy I am that it now exists
(finally, Quantity can be concatenated!), we're going the Microsoft route
of just layering on top of old code if even for the simplest cases there is
no obvious path for how to remove it.

Anyway, this discussion is probably gotten beyond the point where much is
added. I'll close the `nansum` PR.

All the best,

Marten


p.s. I would say that deprecations within numpy currently *are* hard. E.g.,
the rate at which we add `DEPRECATED` to the C code is substantially larger
than that with which we actually remove any long-deprecated behaviour.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-13 Thread Marten van Kerkwijk
Hi Ralf,


>> I guess the one immediate question is whether `np.sum` and the like
>> should be overridden by `__array_function__` at all, given that what should
>> be the future recommended override already works.
>>
>
> I'm not sure I understand the rationale for this. Design consistency
> matters. Right now the rule is simple: all ufuncs have __array_ufunc__, and
> other functions __array_function__. Why keep on tweaking things for little
> benefit?
>

I'm mostly trying to understand how we would actually change things. I
guess your quite logical argument for consistency is that it requires
`np.sum is np.add.reduce`. But to do that, one would have to get rid of the
`.sum()` method override, and then deprecate using `__array_function__` on
`np.sum`.

A perhaps clearer example is `np.isposinf`, where the implementation truly
consists of ufunc calls only (indeed, it is in `lib/ufunc_like`). I guess
following your consistency logic, one would not remove the
`__array_function__` override until it actually became a ufunc itself.

Anyway, I think this discussion has been useful, if only in making it yet
more clear how difficult it is to deprecate anything!

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-13 Thread Marten van Kerkwijk
On Thu, Jun 13, 2019 at 12:46 PM Stephan Hoyer  wrote:


>
>
>> But how about `np.sum` itself? Right now, it is overridden by
>> __array_function__ but classes without __array_function__ support can also
>> override it through the method lookup and through __array_ufunc__.
>>
>> Would/should there be a point where we just have `sum = np.add.reduce`
>> and drop other overrides? If so, how do we get there?
>>
>> One option might be start reversing the order in `_wrapreduction` - try
>> `__array_ufunc__` if it is defined and only if that fails try the `.sum`
>> method.
>>
>
> Yes, I think we would need to do this sort of thing. It's a bit of
> trouble, but probably doable with some decorator magic. It would indeed be
> nice for sum() to eventually just be np.add.reduce, though to be honest I'm
> not entirely sure it's worth the trouble of a long deprecation cycle --
> people have been relying on the fall-back calling of methods for a long
> time.
>
>
I guess the one immediate question is whether `np.sum` and the like should
be overridden by `__array_function__` at all, given that what should be the
future recommended override already works.

(And, yes, arguably too late to change it now!)

-- Marten



>
>
>> All the best,
>>
>> Marten
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-13 Thread Marten van Kerkwijk
Hi Ralf, others,


>>  Anyway, I guess this is still a good example to consider for how we
>> should go about getting to a new implementation, ideally with just a
>> single-way to override?
>>
>> Indeed, how do we actually envisage deprecating the use of
>> `__array_function__` for a given part of the numpy API? Are we allowed to
>> go cold-turkey if the new implementation is covered by `__array_ufunc__`?
>>
>
> I think __array_function__ is still the best way to do this (that's the
> only actual override, so most robust and performant likely), so I don't see
> any reason for a deprecation.
>
> Yes, I fear I have to agree for the nan-functions, at least for now...

But how about `np.sum` itself? Right now, it is overridden by
__array_function__ but classes without __array_function__ support can also
override it through the method lookup and through __array_ufunc__.

Would/should there be a point where we just have `sum = np.add.reduce` and
drop other overrides? If so, how do we get there?

One option might be start reversing the order in `_wrapreduction` - try
`__array_ufunc__` if it is defined and only if that fails try the `.sum`
method.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-13 Thread Marten van Kerkwijk
Hi All,

`nanfunctions` use `asanyarray` currently, which as Ralf notes scuppers the
plan to use `asarray` - sorry to have been sloppy with stating they
"convert to array".

And, yes, I'll admit to forgetting that we are introducing `where` only in
1.17 - clearly we cannot rely on other classes to have already implemented
it!! (This is the problem with having done it myself - it seems ages ago!)

For the old implementation, there was indeed only one way to override
(`__array_function__`) for non-subclasses, but for the new implementation
there are three ways, though the first two overlap:
- support `.sum()` with all its arguments and `__array_ufunc__` for `isnan`
- support `__array_ufunc__` including reductions (with all their arguments)
- support `__array_function__`

I think that eventually one would like to move to the case where there is
only one (`__array_ufunc__` in this case, since really it is just a
combination of two ufuncs, isnan and add.reduce in this case).

 Anyway, I guess this is still a good example to consider for how we should
go about getting to a new implementation, ideally with just a single-way to
override?

Indeed, how do we actually envisage deprecating the use of
`__array_function__` for a given part of the numpy API? Are we allowed to
go cold-turkey if the new implementation is covered by `__array_ufunc__`?

All the best,

Marten



On Thu, Jun 13, 2019 at 4:18 AM Ralf Gommers  wrote:

>
>
> On Thu, Jun 13, 2019 at 3:16 AM Stephan Hoyer  wrote:
>
>> On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk <
>> m.h.vankerkw...@gmail.com> wrote:
>>
>>> Hi Ralf,
>>>
>>> You're right, the problem is with the added keyword argument (which
>>> would appear also if we did not still have to support the old .sum method
>>> override but just dispatched to __array_ufunc__ with `np.add.reduce` -
>>> maybe worse given that it seems likely the reduce method has seen much less
>>> testing in  __array_ufunc__ implementations).
>>>
>>> Still, I do think the question stands: we implement a `nansum` for our
>>> ndarray class a certain way, and provide ways to override it (three now, in
>>> fact). Is it really reasonable to expect that we wait 4 versions for other
>>> packages to keep up with this, and thus get stuck with given internal
>>> implementations?
>>>
>>
> In general, I'd say yes. This is a problem of our own making. If an
> implementation uses np.sum on an argument that can be a subclass or duck
> array (i.e. we didn't coerce with asarray), then the code is calling out to
> outside of numpy. At that point we have effectively made something public.
> We can still change it, but only if we're sure that we don't break things
> in the process (i.e. we then insert a new asarray, something you're not
> happy about in general ).
>
> I don't get the "three ways to override nansum" by the way. There's only
> one I think: __array_function__. There may be more for the sum() method.
>
> Another way to say the same thing: if a subclass does everything right,
> and we decide to add a new keyword to some function in 1.17 without
> considering those subclasses, then turning around straight after and saying
> "oh those subclasses rely on our old API, it may be okay to either break
> them or send them a deprecation warning" (which is I think what you were
> advocating for - your 4 and 3 options) is not reasonable.
>
>
>>> Aside: note that the present version of the nanfunctions relies on
>>> turning the arguments into arrays and copying 0s into it - that suggests
>>> that currently they do not work for duck arrays like Dask.
>>>
>>
> There being an issue with matrix in the PR suggests there is an issue for
> subclasses at least?
>
>
>> Agreed. We could safely rewrite things to use np.asarray(), without any
>> need to worry about backends compatibility. From an API perspective,
>> nothing would change -- we already cast inputs into base numpy arrays
>> inside the _replace_nan() routine.
>>
>
> In that case yes, nansum can be rewritten. However, it's only because of
> the use of (as)array - if it were asanyarray that would already scupper the
> plan.
>
> Cheers,
> Ralf
>
>
>>
>>>
>>> All the best,
>>>
>>> Marten
>>>
>>> On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers 
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt <
>>>> stef...@berkeley.edu> wrote:
>>>>
>>>>> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
>>>>> > In a way, I

Re: [Numpy-discussion] New release note strategy after branching 1.17.

2019-06-12 Thread Marten van Kerkwijk
The attrs like you sent definitely sounded like it would translate to numpy
nearly trivially. I'm very much in favour!
-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-12 Thread Marten van Kerkwijk
Hi Ralf,

You're right, the problem is with the added keyword argument (which would
appear also if we did not still have to support the old .sum method
override but just dispatched to __array_ufunc__ with `np.add.reduce` -
maybe worse given that it seems likely the reduce method has seen much less
testing in  __array_ufunc__ implementations).

Still, I do think the question stands: we implement a `nansum` for our
ndarray class a certain way, and provide ways to override it (three now, in
fact). Is it really reasonable to expect that we wait 4 versions for other
packages to keep up with this, and thus get stuck with given internal
implementations?

Aside: note that the present version of the nanfunctions relies on turning
the arguments into arrays and copying 0s into it - that suggests that
currently they do not work for duck arrays like Dask.

All the best,

Marten

On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers  wrote:

>
>
> On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt 
> wrote:
>
>> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
>> > In a way, I brought it up mostly as a concrete example of an internal
>> > implementation which we cannot change to an objectively cleaner one
>> because
>> > other packages rely on an out-of-date numpy API.
>>
>
> I think this is not the right way to describe the problem (see below).
>
>
>> This, and the comments Nathaniel made on the array function thread, are
>> important to take note of.  Would it be worth updating NEP 18 with a
>> list of pitfalls?  Or should this be a new informational NEP that
>> discusses—on a higher level—the benefits, risks, and design
>> considerations of providing protocols?
>>
>
> That would be a nice thing to do (the higher level one), but in this case
> I think the issue has little to do with NEP 18. The summary of the issue in
> this thread is a little brief, so let me try to clarify.
>
> 1. np.sum gained a new `where=` keyword in 1.17.0
> 2. using np.sum(x) will detect a `x.sum` method if it's present and try to
> use that
> 3. the `_wrapreduction` utility that forwards the function to the method
> will compare signatures of np.sum and x.sum, and throw an error if there's
> a mismatch for any keywords that have a value other than the default
> np._NoValue
>
> Code to check this:
> >>> x1 = np.arange(5)
> >>> x2 = np.asmatrix(x1)
> >>> np.sum(x1)  # works
> >>> np.sum(x2)  # works
> >>> np.sum(x1, where=x1>3)  # works
> >>> np.sum(x2, where=x2>3)  # _wrapreduction throws TypeError
> ...
> TypeError: sum() got an unexpected keyword argument 'where'
>
> Note that this is not specific to np.matrix. Using pandas.Series you also
> get a TypeError:
> >>> y = pd.Series(x1)
> >>> np.sum(y)  # works
> >>> np.sum(y, where=y>3)  # pandas throws TypeError
> ...
> TypeError: sum() got an unexpected keyword argument 'where'
>
> The issue is that when we have this kind of forwarding logic, irrespective
> of how it's implemented, new keywords cannot be used until the array-like
> objects with the methods that get forwarded to gain the same keyword.
>
> tl;dr this is simply a cost we have to be aware of when either proposing
> to add new keywords, or when proposing any kind of dispatching logic (in
> this case `_wrapreduction`).
>
> Regarding internal use of  `np.sum(..., where=)`: this should not be done
> until at least 4-5 versions from now, and preferably way longer than that.
> Because doing so will break already released versions of Pandas, Dask, and
> other libraries with array-like objects.
>
> Cheers,
> Ralf
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] New release note strategy after branching 1.17.

2019-06-12 Thread Marten van Kerkwijk
Overall, in favour of splitting the large files, but I don't like that the
notes stop being under version control (e.g., a follow-up PR slightly
changes things, how does the note gets edited/reverted?).

Has there been any discussion of having, e.g., a directory
`docs/1.17.0-notes/`, and everyone storing their notes as individual files?
(A bit like maildir vs a single inbox file.) At release time, one would
then simply merge/order the files. Beyond staying within git, an advantage
of this may be that automation is easier (e.g., if the file is always
called .rst, then checks for it can be very easily
automated.).
-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] (Value Based Promotion) Current Behaviour

2019-06-11 Thread Marten van Kerkwijk
HI Sebastian,

Thanks for the overview! In the value-based casting, what perhaps surprises
me most is that it is done within a kind; it would seem an improvement to
check whether a given integer scalar is exactly representable in a given
float (your example of 1024 in `float16`). If we switch to the python-only
scalar values idea, I would suggest to abandon this. That might make
dealing with things like `Decimal` or `Fraction` easier as well.

All the best,

Marten

On Tue, Jun 11, 2019 at 8:46 PM Sebastian Berg 
wrote:

> Hi all,
>
> strange, something went wrong sending that email, but in any case...
>
> I tried to "summarize" the current behaviour of promotion and value
> based promotion in numpy (correcting a small error in what I wrote
> earlier). Since it got a bit long, you can find it here (also copy
> pasted at the end):
>
> https://hackmd.io/NF7Jz3ngRVCIQLU6IZrufA
>
> Allan's document which I link in there is also very interesting. One
> thing I had not really thought about before was the problem of
> commutativity.
>
> I do not have any specific points I want to discuss based on it (but
> those are likely to come up again later).
>
> All the Best,
>
> Sebastian
>
>
> -
>
> PS: Below a copy of what I wrote:
>
> ---
> title: Numpy Value Based Promotion Rules
> author: Sebastian Berg
> ---
>
>
>
> NumPy Value Based Scalar Casting and Promotion
> ==
>
> This document reviews some of the behaviours of the promotion rules
> within numpy. This is especially with respect to the promotion of
> scalars and 0D arrays which inspect the value to decide casting and
> promotion.
>
> Other documents discussing these things:
>
>   * `from numpy.testing import print_coercion_tables` prints the
> current promotion tables including value based promotion for small
> positive/negative scalars.
>   * Allan Haldane's thoughts on changing casting/promotion to be more
> C-like and discussing things such as here:
> https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76
>   * Discussion around the problem of uint64 and int64 being promoted to
> float64: https://github.com/numpy/numpy/issues/12525 (lists many
> related issues).
>
>
> Nomenclature and Defintions
> ---
>
> * **dtype/type**: The data type of an array or scalar: `float32`,
> `float64`, `int8`, …
>
> * **Category**: A category to which the data type belongs, in this
> context these are:
>   1. boolean
>   2. integer (unsigned and signed are not split up here, but are
> different "kinds")
>   3. floating point and complex (not split up here but are different
> "kinds")
>   5. All others
>
> * **Casting**: converting from one dtype to another. There are four
> different rules of casting:
>   1. *"safe"* casting: All values are representable in the new data
> type. I.e. no information is lost during the conversion.
>   2. *"same kind"* casting: data loss may occur, but only within the
> same "kind". For example a float64 can be converted to float32 using
> "same kind" rules, an int64 can be converted to int16. This is although
> both lose precision or even produce incorrect values. Note that "kind"
> is different from "category" in that it distinguishes between signed
> and unsigned integers.
>   4. *"unsafe"* casting: Any conversion which can be defined, e.g.
> floating point to integer. For promotion this is fairly unimportant.
> (Some conversions such as string to integer, which not even work fall
> in this category, but could also be called coercions or conversions.)
>
> * **Promotion**: The general process of finding a new dtype for
> multiple input dtypes. Will be used here to also denote any kind of
> casting/promotion done before a specific function is called. This can
> be more complex, because in rare cases a functions can for example take
> floating point numbers and integers as input at the same time (i.e.
> `np.ldexp`).
>
> * **Common dtype**: A dtype which can represent all input data. In
> general this means that all inputs can be safely cast to this dtype.
> Within numpy this is the normal and simplest form of promotion.
>
> * **`type1, type2 -> type3`**: Defines a promotion or signature. For
> example adding two integers: `np.int32(5) + np.int32(3)` gives
> `np.int32(8)`. The dtype signature for that example would be: `int32,
> int32 -> int32`. A short form for this is also `ii->i` using C-like
> type codes, this can be found for example in `np.ldexp.types` (and any
> numpy ufunc).
>
> * **Scalar**: A numpy or python scalar or a **0-D array**. It is
> important to remember that zero dimensional arrays are treated just
> like scalars with respect to casting and promotion.
>
>
> Current Situation in Numpy
> --
>
> The current situation can be understand mostly in terms of safe casting
> which is defined based on the type hierarchy and is sensitive to values
> for scalars.
>
> This safe casting based approach is in 

Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-11 Thread Marten van Kerkwijk
> In a way, I brought it up mostly as a concrete example of an internal
> implementation which we cannot change to an objectively cleaner one because
> other packages rely on an out-of-date numpy API.
>
> Should have added: rely on an out-of-date numpy API where we have multiple
ways for packages to provide their own overrides. But I guess in this case
it is really matrix which is the problem. If so, maybe just adding kwargs
to it is something we should consider.

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-10 Thread Marten van Kerkwijk
Hi All,

In https://github.com/numpy/numpy/pull/12801, Tyler has been trying to use
the new `where` argument for reductions to implement `nansum`, etc., using
simplifications that boil down to `np.sum(..., where=~isnan(...))`.

A problem that occurs is that `np.sum` will use a `.sum` method if that is
present, and for matrix, the `.sum` method does not take a `where`
argument. Since the `where` argument has been introduced only recently,
similar problems may happen for array mimics that implement their own
`.sum` method.

The question now is what to do, with options being:
1. Let's stick with the existing implementation; the speed-up is not that
great anyway.
2. Use try/except around the new implementation and use the old one if it
fails.
3. As (2), but emit a deprecation warning. This will help array mimics, but
not matrix (unless someone makes a PR; would we even accept it?);
4. Use the new implementation. `matrix` should be gone anyway and array
mimics can either update their `.sum()` method or override `np.nansum` with
`__array_function__`.

Personally, I'd prefer (4), but realize that (3) is probably the more safer
approach, even if it is really annoying to still be taking into account
matrix deficiencies.

All the best,

Marten

p.s. One could also avoid the `.sum()` override altogether by doing
`np.add.reduce(..., where=...)`, but this would probably break just as much.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving forward with value based casting

2019-06-07 Thread Marten van Kerkwijk
On Fri, Jun 7, 2019 at 1:19 AM Ralf Gommers  wrote:

>
>
> On Fri, Jun 7, 2019 at 1:37 AM Nathaniel Smith  wrote:
>
>>
>> My intuition is that what users actually want is for *native Python
>> types* to be treated as having 'underspecified' dtypes, e.g. int is
>> happy to coerce to int8/int32/int64/whatever, float is happy to coerce
>> to float32/float64/whatever, but once you have a fully-specified numpy
>> dtype, it should stay.
>>
>
> Thanks Nathaniel, I think this expresses a possible solution better than
> anything I've seen on this list before. An explicit "underspecified types"
> concept could make casting understandable.
>

I think the current model is that this holds for all scalars, but changing
that to be just for not already explicitly typed types makes sense.

In the context of a mental picture, one could think in terms of coercion,
of numpy having not just a `numpy.array` but also a `numpy.scalar`
function, which takes some input and tries to make a numpy scalar of it.
For python int, float, complex, etc., it uses the minimal numpy type.

Of course, this is slightly inconsistent with the `np.array` function which
converts things to `ndarray` using a default type for int, float, complex,
etc., but my sense is that that is explainable, e.g.,imagining both
`np.scalar` and `np.array` to have dtype attributes, one could say that the
default for one would be `'minimal'` and the other `'64bit'` (well, that
doesn't work for complex, but anyway).

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving forward with value based casting

2019-06-05 Thread Marten van Kerkwijk
Hi Sebastian,

Tricky! It seems a balance between unexpected memory blow-up and unexpected
wrapping (the latter mostly for integers).

Some comments specifically on your message first, then some more general
related ones.

1. I'm very much against letting `a + b` do anything else than `np.add(a,
b)`.
2. For python values, an argument for casting by value is that a python int
can be arbitrarily long; the only reasonable course of action for those
seems to make them float, and once you do that one might as well cast to
whatever type can hold the value (at least approximately).
3. Not necessarily preferred, but for casting of scalars, one can get more
consistent behaviour also by extending the casting by value to any array
that has size=1.

Overall, just on the narrow question, I'd be quite happy with your
suggestion of using type information if available, i.e., only cast python
values to a minimal dtype.If one uses numpy types, those mostly will have
come from previous calculations with the same arrays, so things will work
as expected. And in most memory-limited applications, one would do
calculations in-place anyway (or, as Tyler noted, for power users one can
assume awareness of memory and thus the incentive to tell explicitly what
dtype is wanted - just `np.add(a, b, dtype=...)`, no need to create `out`).

More generally, I guess what I don't like about the casting rules generally
is that there is a presumption that if the value can be cast, the operation
will generally succeed. For `np.add` and `np.subtract`, this perhaps is
somewhat reasonable (though for unsigned a bit more dubious), but for
`np.multiply` or `np.power` it is much less so. (Indeed, we had a long
discussion about what to do with `int ** power` - now special-casing
negative integer powers.) Changing this, however, probably really is a
bridge too far!

Finally, somewhat related: I think the largest confusing actually results
from the `uint64+in64 -> float64` casting.  Should this cast to int64
instead?

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] defining a NumPy API standard?

2019-06-03 Thread Marten van Kerkwijk
Hi Stefan,

On Mon, Jun 3, 2019 at 4:26 PM Stefan van der Walt 
wrote:

> Hi Marten,
>
> On Sat, 01 Jun 2019 12:11:38 -0400, Marten van Kerkwijk wrote:
> > Third, we could actual implementing the logical groupings identified in
> the
> > code base (and describing them!). Currently, it is a mess: for the C
> files,
> > I typically have to grep to even find where things are done, and while
> for
> > the functions defined in python files that is not necessary, many have
> > historical rather than logical groupings (looking at you,
> `from_numeric`!),
> > and even more descriptive ones like `shape_base` are split over `lib` and
> > `core`. I think it would help everybody if we went to a python-like
> layout,
> > with a true core and libraries such as polynomial, fft, ma, etc.
>
> How hard do you think it would be to address this issue?  You seem to
> have some notion of which pain points should be prioritized, and it
> might be useful to jot those down somewhere (tracking issue on GitHub?).
>

The python side would, I think, not be too hard. But I don't really have
that much of a notion - it would very much be informed by making a list
first. For the C parts, I feel even more at a loss: one really would have
to start with a summary of what is actually there (and I think the
organization may well be quite logical already; I've not so felt it was
wrong as in need of an overview).

Somewhat of an aside, but relevant for the general discussion:
updating/rewriting the user documentation may well be the best *first*
step. It certainly doesn't hurt to try to make some list now, but my guess
that the best one will emerge only when one tries to summarize what a new
user should know/understand.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] defining a NumPy API standard?

2019-06-02 Thread Marten van Kerkwijk
On Sun, Jun 2, 2019 at 2:21 PM Eric Wieser 
wrote:

> Some of your categories here sound like they might be suitable for ABCs
> that provide mixin methods, which is something I think Hameer suggested in
> the past. Perhaps it's worth re-exploring that avenue.
>
> Eric
>
>
Indeed, and of course for __array_ufunc__ we moved there a bit already,
with `NDArrayOperatorsMixin` [1].
One could certainly similarly have NDShapingMixin that, e.g., relied on
`shape`, `reshape`, and `transpose` to implement `ravel`, `swapaxes`, etc.
And indeed use those mixins in `ndarray` itself.

For this also having a summary of base functions/methods would be very
helpful.
-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] defining a NumPy API standard?

2019-06-01 Thread Marten van Kerkwijk
> Our API is huge. A simple count:
> main namespace: 600
> fft: 30
> linalg: 30
> random: 60
> ndarray: 70
> lib: 20
> lib.npyio: 35
> etc. (many more ill-thought out but not clearly private submodules)
>
>
I would perhaps start with ndarray itself. Quite a lot seems superfluous

Shapes:
- need: shape, strides, reshape, transpose;
- probably: ndim, size, T
- less so: nbytes, ravel, flatten, squeeze, and swapaxes.

Getting/setting:
- need __getitem__, __setitem__;
- less so: fill, put, take, item, itemset, repeat, compress, diagonal;

Datatype/Copies/views/conversion
- need: dtype, copy, view, astype, flags
- less so: ctypes, dump, dumps, getfield, setfield, itemsize, byteswap,
newbyteorder, resize, setflags, tobytes, tofile, tolist, tostring,

Iteration
- need __iter__
- less so: flat

Numerics
- need: conj, real, imag
- maybe also: min, max, mean, sum, std, var, prod, partition, sort, tracet;
- less so: the arg* ones, cumsum, cumprod, clip, round, dot, all, any,
nonzero, ptp, searchsorted,
choose.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] defining a NumPy API standard?

2019-06-01 Thread Marten van Kerkwijk
> > In this respect, I think an excellent place to start might be
>> > something you are planning already anyway: update the user
>> > documentation
>> >
>>
>> I would include tests as well. Rather than hammer out a full standard
>> based on extensive discussions and negotiations, I would suggest NumPy
>> might be able set a de-facto "standard" based on pieces of the the
>> current numpy user documentation and test suite.
>
>
> I think this is potentially useful, but *far* more prescriptive and
> detailed than I had in mind. Both you and Nathaniel seem to have not
> understood what I mean by "out of scope", so I think that's my fault in not
> being explicit enough. I *do not* want to prescribe behavior. Instead, a
> simple yes/no for each function in numpy and method on ndarray.
>

I quite like the idea of trying to be better at defining the API through
tests - the substitution principle in action! Systematically applying tests
to both ndarray and MaskedArray might be a start internally (just a pytest
fixture away...). But definitely start with more of an overview.

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] defining a NumPy API standard?

2019-06-01 Thread Marten van Kerkwijk
Hi Ralf,

Despite sharing Nathaniel's doubts about the ease of defining the numpy API
and the likelihood of people actually sticking to a limited subset of what
numpy exposes, I quite like the actual things you propose to do!

But my liking it is for reasons that are different from your stated ones: I
think the proposed actions are likely to benefit greatly  both for users
(like Bill above) and current and prospective developers.  To me, it seems
almost as a side benefit (if a very nice one) that it might help other
projects to share an API; a larger benefit may come from tapping into the
experience of other projects in thinking about what are the true  basic
functions/method that one should have.

More concretely, to address Nathaniel's (very reasonable) worry about
ending up wasting a lot of time, I think it may be good to identify smaller
parts, each of which are useful on their own.

In this respect, I think an excellent place to start might be something you
are planning already anyway: update the user documentation. Doing this will
necessarily require thinking about, e.g., what `ndarray` methods and
properties are actually fundamental, as you only want to focus on a few.
With that in place, one could then, as you suggest, reorganize the
reference documentation to put those most important properties up front,
and ones that we really think are mistakes at the bottom, with explanations
of why we think so and what the alternative is. Also for the reference
documentation, it would help to group functions more logically.

The above could lead to three next steps, all of which I think would be
useful. First, for (prospective) developers as well as for future
maintenance, I think it would be quite a large benefit if we (slowly but
surely) rewrote code that implements the less basic functionality in terms
of more basic functions (e.g., replace use of `array.fill(...)` or
`np.copyto(array, ...)` with `array[...] =`).

Second, we could update Nathaniel's NEP about distinct properties duck
arrays might want to mimic/implement.

Third, we could actual implementing the logical groupings identified in the
code base (and describing them!). Currently, it is a mess: for the C files,
I typically have to grep to even find where things are done, and while for
the functions defined in python files that is not necessary, many have
historical rather than logical groupings (looking at you, `from_numeric`!),
and even more descriptive ones like `shape_base` are split over `lib` and
`core`. I think it would help everybody if we went to a python-like layout,
with a true core and libraries such as polynomial, fft, ma, etc.

Anyway, re-reading your message, I realize the above is not really what you
wrote about, so perhaps this is irrelevant...

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] __skip_array_function__ discussion summary

2019-05-23 Thread Marten van Kerkwijk
Hi Sebastian, Stéfan,

Thanks for the very good summaries!

An additional item worth mentioning is that by using
`__skip_array_function__` everywhere inside, one minimizes the performance
penalty of checking for `__array_function__`. It would obviously be worth
trying to do that, but ideally in a way that is much less intrusive.

Furthermore, it became clear that there were different pictures of the
final goal, with quite a bit of discussion about the relevant benefits of
trying the limit exposure of the internal API and of, conversely, trying to
(incrementally) move to implementations that are maximally re-usable (using
duck-typing), which are themselves based around a smaller core (more in
line with Nathaniel's NEP-22).

In the latter respect, Stéfan's example is instructive. The real
implementation of `ones_like` is:
```
def ones_like(a, dtype=None, order='K', subok=True, shape=None):
res = empty_like(a, dtype=dtype, order=order, subok=subok, shape=shape)
multiarray.copyto(res, 1, casting='unsafe')
return res
```

The first step is here seems obvious: an "empty_like" function would seem
to belong in the core.
The second step less so: Stéfan's `res.fill(1)` seems more logical, as
surely a class's method is the optimal way to do something. Though I do
feel `.fill` itself breaks "There should be one-- and preferably only one
--obvious way to do it." So, I'd want to replace it with `res[...] = 1`, so
that one relies on the more obvious `__setitem__`. (Note that all are
equally fast even now.)

Of course, in this idealized future, there would be little reason to even
allow `ones_like` to be overridden with __array_function__...

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Converting np.sinc into a ufunc

2019-05-23 Thread Marten van Kerkwijk
I agree that we should not have two functions

I also am rather unsure whether a ufunc is a good idea. Earlier, while
discussing other possible additions, like `erf`, the conclusion seemed to
be that in numpy we should just cover whatever is in the C standard. This
suggests `sinc` should not be a ufunc.

-- Marten

p.s.`scipy.special.sinc` *is* `np.sinc`

p.s.2 The accuracy argument is not in itself an argument for a ufunc, as it
could be done in python too.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Converting np.sinc into a ufunc

2019-05-22 Thread Marten van Kerkwijk
On a more general note, if we change to a ufunc, it will get us stuck with
sinc being the normalized version, where the units of the input have to be
in the half-cycles preferred by signal-processing people rather than the
radians preferred by mathematicians.

In this respect, note that there is an outstanding issue about whether to
allow one to choose between the two:
https://github.com/numpy/numpy/issues/13457 (which itself was raised
following an inconclusive PR that tried to add a keyword argument for it).

Adding a keyword argument is much easier for a general function than for a
ufunc.

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Converting np.sinc into a ufunc

2019-05-22 Thread Marten van Kerkwijk
> Otherwise, there should
>>> be no change except additional features of ufuncs and the move to a C
>>> implementation.
>>>
>>
> I see this is one of the functions that uses asanyarray, so what about
> impact on subclass behavior?
>

So, subclasses are passed on, as they are in ufuncs. In general, that
should mean it is OK for subclasses. For astropy's Quantity, it would be an
improvement, as `where` was never properly supported, and with a ufunc, we
can handle it easily.

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Keep __array_function__ unexposed by default for 1.17?

2019-05-22 Thread Marten van Kerkwijk
> If we want to keep an "off" switch we might want to add some sort of API
> for exposing whether NumPy is using __array_function__ or not. Maybe
> numpy.__experimental_array_function_enabled__ = True, so you can just test
> `hasattr(numpy, '__experimental_array_function_enabled__')`? This is
> assuming that we are OK with adding an underscore attribute to NumPy's
> namespace semi-indefinitely.
>

Might this be overthinking it? I might use this myself on supercomputer
runs were I know that I'm using arrays only. Though one should not
extrapolate from oneself!

That said, it is not difficult as is. For instance, we could explain in the
docs that one can tell from:
```
enabled = hasattr(np.core, 'overrides') and
np.core.overrides.ENABLE_ARRAY_FUNCTION
```
One could even allow for eventual removal by explaining it should be,
```
enabled = hasattr(np.core, 'overrides') and getattr(np.core.overrides,
'ENABLE_ARRAY_FUNCTION', True)
```
(If I understand correctly, one cannot tell from the presence of
`ndarray.__array_function__`, correct?)

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Keep __array_function__ unexposed by default for 1.17?

2019-05-22 Thread Marten van Kerkwijk
Hi Stephan,

I'm quite happy with the idea of turning on __array_function__ but
postponing any formal solution to getting into the wrapped routines (i.e.,
one can use __wrapped__, but it is an implementation detail that is not
documented and comes with absolutely no guarantees).

That way, 1.17 will be a release where we can think of how to address two
different things:
1. Reduce the overhead costs for pure ndarray cases (i.e., mostly within
numpy itself);
2. Simplify implementation in outside packages.

On the performance front, I'm not quite sure what the state of the
environment variable check is, but is it possible to just flip the default,
i.e., for 1.17 one gets __array_function__ support turned on by default,
but can turn it off if wanted?

All the best,

Marten

On Wed, May 22, 2019 at 11:53 AM Stephan Hoyer  wrote:

> Thanks for raising these concerns.
>
> The full implications of my recent __skip_array_function__ proposal are
> only now becoming evident to me now, looking at it's use in GH-13585.
> Guaranteeing that it does not expand NumPy's API surface seems hard to
> achieve without pervasive use of __skip_array_function__ internally.
>
> Taking a step back, the sort of minor hacks [1] that motivated
> __skip_array_function__ for me are annoying, but really not too bad -- they
> are a small amount of additional code duplication in a proposal that
> already requires a large amount of code duplication.
>
> So let's roll back the recent NEP change adding __skip_array_function__ to
> the public interface [2]. Inside the few NumPy functions where
> __array_function__ causes a measurable performance impact due to repeated
> calls (most notably np.block, for which some benchmarks are 25% slower), we
> can make use of the private __wrapped__ attribute.
>
> I would still like to turn on __array_function__ in NumPy 1.17. At least,
> let's try that for the release candidate and see how it goes. The "all in"
> nature of __array_function__ without __skip_array_function__ will both
> limit its use to cases where it is strongly motivated, and also limits the
> API implications for NumPy. There is still plenty of room for expanding the
> protocol, but it's really hard to see what is necessary (and prudent!)
> without actual use.
>
> [1] e.g., see
> https://github.com/google/jax/blob/62473351643cecb6c248a50601af163646ba7be6/jax/numpy/lax_numpy.py#L2440-L2459
> [2] https://github.com/numpy/numpy/pull/13305
>
>
>
>
> On Tue, May 21, 2019 at 11:44 PM Juan Nunez-Iglesias 
> wrote:
>
>> I just want to express my general support for Marten's concerns. As an
>> "interested observer", I've been meaning to give `__array_function__` a try
>> but haven't had the chance yet. So from my anecdotal experience I expect
>> that more people need to play with this before setting the API in stone.
>>
>> At scikit-image we place a very strong emphasis on code simplicity and
>> readability, so I also share Marten's concerns about code getting too
>> complex. My impression reading the NEP was "whoa, this is hard, I'm glad
>> smarter people than me are working on this, I'm sure it'll get simpler in
>> time". But I haven't seen the simplicity materialise...
>>
>> On Wed, 22 May 2019, at 11:31 AM, Marten van Kerkwijk wrote:
>>
>> Hi All,
>>
>> For 1.17, there has been a big effort, especially by Stephan, to make
>> __array_function__ sufficiently usable that it can be exposed. I think this
>> is great, and still like the idea very much, but its impact on the numpy
>> code base has gotten so big in the most recent PR (gh-13585) that I wonder
>> if we shouldn't reconsider the approach, and at least for 1.17 stick with
>> the status quo. Since that seems to be a bigger question than can be
>> usefully addressed in the PR, I thought I would raise it here.
>>
>> Specifically, now not only does every numpy function have its dispatcher
>> function, but also internally all numpy function calls are being done via
>> the new `__skip_array_function__` attribute, to avoid further overrides. I
>> think both changes make the code significantly less readable, thus, e.g.,
>> making it even harder than it is already to attract new contributors.
>>
>> I think with this it is probably time to step back and check whether the
>> implementation is in fact the right one. For instance, among the
>> alternatives we originally considered was one that had the overridable
>> versions of functions in the regular `numpy` namespace, and the once that
>> would not themselves check in a different one. Alternatively, for some of
>> the benefits provided by `__skip_array_function__`, there was a different
>> suggestion t

[Numpy-discussion] Keep __array_function__ unexposed by default for 1.17?

2019-05-21 Thread Marten van Kerkwijk
Hi All,

For 1.17, there has been a big effort, especially by Stephan, to make
__array_function__ sufficiently usable that it can be exposed. I think this
is great, and still like the idea very much, but its impact on the numpy
code base has gotten so big in the most recent PR (gh-13585) that I wonder
if we shouldn't reconsider the approach, and at least for 1.17 stick with
the status quo. Since that seems to be a bigger question than can be
usefully addressed in the PR, I thought I would raise it here.

Specifically, now not only does every numpy function have its dispatcher
function, but also internally all numpy function calls are being done via
the new `__skip_array_function__` attribute, to avoid further overrides. I
think both changes make the code significantly less readable, thus, e.g.,
making it even harder than it is already to attract new contributors.

I think with this it is probably time to step back and check whether the
implementation is in fact the right one. For instance, among the
alternatives we originally considered was one that had the overridable
versions of functions in the regular `numpy` namespace, and the once that
would not themselves check in a different one. Alternatively, for some of
the benefits provided by `__skip_array_function__`, there was a different
suggestion to have a special return value, of `NotImplementedButCoercible`.
Might these be better after all?

More generally, I think we're suffering from the fact that several of us
seem to have rather different final goals in mind  In particular, I'd like
to move to a state where as much of the code as possible makes use of the
simplest possible implementation, with only a few true base functions, so
that all but those simplest functions will generally work on any type of
array. Others, however, worry much more about making implementations (even
more) part of the API.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding to the non-dispatched implementation of NumPy methods

2019-04-29 Thread Marten van Kerkwijk
On Sun, Apr 28, 2019 at 9:20 PM Stephan Hoyer  wrote:

> On Sun, Apr 28, 2019 at 8:42 AM Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>> In summary, I think the guarantees should be as follows:
>> 1.If you call np.function and
>>   - do not define __array_function__, changes happen only via the usual
>> cycle.
>>   - define __array_function__, you take responsibility for returning the
>> result.
>> 2. If you call np.function.__wrapped__ and
>>   - input only ndarray, changes happen only via the usual cycle;
>>   - input anything but ndarray, changes can happen in any release.
>>
>
> The uses that I've seen so far (in CuPy and JAX), involve a handful of
> functions that are directly re-exported from NumPy, e.g.,
> jax.numpy.array_repr is the exact same object as numpy.array_repr:
>
> https://github.com/cupy/cupy/blob/c3f1be602bf6951b007beaae644a5662f910048b/cupy/__init__.py#L341-L366
>
> https://github.com/google/jax/blob/5edb23679f2605654949156da84e330205840695/jax/numpy/lax_numpy.py#L89-L132
>
>
> I suspect this will be less common in the future if __array_function__
> takes off, but for now it's convenient because users don't need to know
> exactly which functions have been reimplemented. They can just use "import
> jax.numpy as np" and everything works.
>
> These libraries are indeed passing CuPy or JAX arrays into NumPy
> functions, which currently happen to have the desired behavior, thanks to
> accidental details about how NumPy currently supports duck-typing and/or
> coercions.
>
> To this end, it would be really nice to have an alias that *is* guaranteed
> to work exactly as if __array_function__ didn't exist, and not only for
> numpy.ndarray arrays.
>

Just to be clear: for this purpose, being able to call the implementation
is still mostly a convenient crutch, correct? For classes that define
__array_function__, would you expect more than the guarantee I wrote above,
that the wrapped version will continue to work as advertised for ndarray
input only?

In particular, suppose we change an implementation to use different other
numpy functions inside (which are of course overridden using
__array_function__). I could imagine situations where  that would work fine
for everything that does not define __array_ufunc__, but where it would not
for classes that do define it. Is that then a problem for numpy or for the
project that has a class that defines __array_function__?


> Ralf raises a good point about the name. We don't need to add this
> attribute for ufuncs and __array_ufunc__ yet, but
> (1) we might want this in the future, just for consistency in the design
> of __array_function__ and __array_ufunc__, and
> (2) we definitely don't want to rule out converting functions into ufunc.
>
> So we might as well pick a name that works for both, e.g.,
> __skip_array_overrides__ rather than __skip_array_function__. This would
> let us save our users a bit of pain by not requiring them to make changes
> like  np.where.__skip_array_function__ -> np.where.__skip_array_ufunc__.
>

Note that for ufuncs it is not currently possible to skip the override. I
don't think it is super-hard to do it, but I'm not sure I see the need to
add a crutch where none has been needed so far. More generally, it is not
obvious there is any C code where skipping the override is useful, since
the C code relies much more directly on inputs being ndarray.

All the best,

Marten


___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding to the non-dispatched implementation of NumPy methods

2019-04-28 Thread Marten van Kerkwijk
Hi Nathaniel,

Thanks, I now see your point. I think I can weasel my way partially out:
the default *output* from `np.concatenate` is an ndarray, so in that
respect it is not that strange that when no input defines
__array_function__, one would call `ndarray.__array_function__` (I realize
this is sophistry and that it breaks down with all-scalar functions, but
still feel it is defensible...).

Your point about `__skipping_array_function__` is well taken, though: it is
not very logical since suddenly one again ignores items that define
__array_function__. Its real purpose is to be a useful crutch if one wants
to start to define __array_function__ on one's class. But arguably this is
yet more reason to just stick with __wrapped__ - i.e., be explicit that it
is an implementation detail.

All the best,

Marten


On Sun, Apr 28, 2019 at 6:50 PM Nathaniel Smith  wrote:

> On Sun, Apr 28, 2019 at 1:38 PM Marten van Kerkwijk
>  wrote:
> >
> > Hi Nathaniel,
> >
> > I'm a bit confused why` np.concatenate([1, 2], [3, 4])` would be a
> problem. In the current model, all (numpy) functions fall back to
> `ndarray.__array_function__`, which does know what to do with anything that
> doesn't have `__array_function__`: it just coerces it to array. Am I
> missing something?
>
> IMO, the reason that having ndarray.__array_function__ was attractive
> in the first place, was that we were hoping it would let you pretend
> that there's nothing special about ndarray.  Like, when you call
> np.concatenate, it just looks for __array_function__ methods and
> dispatches to them; sometimes that means calling
> thirdpartyobject.__array_function__, and sometimes it means calling
> ndarray.__array_function__, but as far as np.concatenate is concerned
> those are interchangeable and treated in the same way.
>
> But in fact ndarray.__array_function__ *is* special. I guess you could
> write down the semantics so that np.concatenate([1, 2], [3, 4]) still
> calls ndarray.__array_function__, by defining a special dispatch rules
> just for ndarray.__array_function__. But if ndarray.__array_function__
> isn't going to follow the same dispatch rule, then why should it exist
> and be called "__array_function__"? A special method like
> "__array_function__" is nothing except a name for a dispatch rule.
>
> And if we add __skipping_array_function__, it makes this even worse.
> In a model where dispatch always goes through *some* object's
> __array_function__, then __skipping_array_function__ makes no sense --
> if you skip __array_function__ then there's nothing left.
>
> You might try to save it by saying, ok, we'll only skip third-party
> __array_function__, but still dispatch to ndarray.__array_function__.
> But this doesn't work either.
> np.concatenate.__skipping_array_function__(...) is different from
> ndarray.__array_function__(np.concatenate, ...), because they treat
> arguments with __array_function__ methods differently. (The former
> coerces them to ndarray; the latter returns NotImplemented.) Neither
> can be implemented in terms of the other (!).
>
> ndarray.__array_function__ was a nice idea, but I don't think there's
> any way to fit into a coherent system.
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding to the non-dispatched implementation of NumPy methods

2019-04-28 Thread Marten van Kerkwijk
Hi Nathaniel,

I'm a bit confused why` np.concatenate([1, 2], [3, 4])` would be a problem.
In the current model, all (numpy) functions fall back to
`ndarray.__array_function__`, which does know what to do with anything that
doesn't have `__array_function__`: it just coerces it to array. Am I
missing something?

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding to the non-dispatched implementation of NumPy methods

2019-04-28 Thread Marten van Kerkwijk
Hi Ralf,

Thanks for the comments and summary slides. I think you're
over-interpreting my wish to break people's code! I certainly believe - and
think we all agree - that we remain as committed as ever to ensure that
```
np.function(inputs)
```
continues to work just as before. My main comment is that I want to ensure
that no similar guarantee will exist for
```
np.function.__wrapped__(inputs)
```
(or whatever we call it). I think that is quite consistent with NEP-18,
since as originally written there was not even the possibility to access
the implementation directly (which was after long discussions about whether
to allow it, including ideas like `import numpy.internal_apt as np`). In
this respect, the current proposal is a large deviation from the original
intent, so we need to be clear about what we are promising.

In summary, I think the guarantees should be as follows:
1.If you call np.function and
  - do not define __array_function__, changes happen only via the usual
cycle.
  - define __array_function__, you take responsibility for returning the
result.
2. If you call np.function.__wrapped__ and
  - input only ndarray, changes happen only via the usual cycle;
  - input anything but ndarray, changes can happen in any release.

On the larger picture,in your slides, the further split that happens is
that if no override is present, the first thing that actually gets called
is not the function implementation but rather `ndarray.__array_function__`.
I think it is important to add this to your mental image (and the slides),
since it means that generic parts of the implementations (like coercion
;-), can be moved there. For ufuncs, this is relatively easy, for other
functions less so since they differ quite a bit in what coercion they do.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding to the non-dispatched implementation of NumPy methods

2019-04-27 Thread Marten van Kerkwijk
Hi All,

I agree with Ralf that there are two discussions going on, but also with
Hameer that they are related, in that part of the very purpose of
__array_function__ was to gain freedom to experiment with implementations.
And in particular the freedom to *assume* that inputs are arrays so that we
can stop worrying about breaking subclasses and duck arrays: with
__array_function__, we can now (well, eventually) tell people to just
override the function and make it do what they want. In that respect, if
having `__numpy_implementation__` means it would not change the existing
situation with backwards compatibility, then that is bad: we do want to
change that! The point is to keep the exposed API stable, not the
implementation.

Of course, the idea proposed here is not to keep the implementation stable,
but rather to help people implement __array_function__ who are daunted by
the many functions that can be overridden, especially for those classes for
which many numpy functions work out of the box: they can have a default of
just calling the old implementation.

In the end, the proposed goal for __numpy_implementation__ really seems
simple: to provide a better name for __wrapped__. But to me the discussion
has proven that this is in fact not a good idea. It does suggest stability
where there should be none, and the name itself is contentious. Maybe it is
best to just stick with __wrapped__? If so, the the only change would be
that we mention it in the NEP, making again explicit that the wrapped
implementation can and will be changed.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding to the non-dispatched implementation of NumPy methods

2019-04-25 Thread Marten van Kerkwijk
On Thu, Apr 25, 2019 at 6:04 PM Stephan Hoyer  wrote:

> On Thu, Apr 25, 2019 at 12:46 PM Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
 

>
> It would be nice, though, if we could end up with also option 4 being
>> available, if only because code that just can assume ndarray will be
>> easiest to read.
>>
>
> This could perhaps just be coercion_function=None? Or maybe we want to
> keep around coercion_function=None for "do whatever ad-hoc coercion NumPy
> current does"?
>
>>
>> I think `None` better mean no coercion...  But the default doesn't have
to be `None`, of course.
-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding to the non-dispatched implementation of NumPy methods

2019-04-25 Thread Marten van Kerkwijk
It seems we are adding to the wishlist!  I see four so far:
1. Exposed in API, can be overridden with __array_ufunc__
2. One that converts everything to ndarray (or subclass); essentially the
current implementation;
3. One that does asduckarray
4. One that assumes all arguments are arrays.

Maybe handiest would be if there is a method to coerce all relevant
arguments with a function of one's choice? I.e., in the example of Stephan,
one would have
```
if function in JUST_COERCE:
coerced_args, coerced_kwargs = function.__coerce__(np.asanyarray,
*args, **kwargs)
return function.__implementation__(*coerced_args, **coerced_kwargs)
```
Actually, this might in fact work with the plan proposed here, if we allow
for an extra, optional kwarg that contains the coercion function, that is
```
return function.__implementation__(*args,
coercion_function=np.asanyarray, **kwargs)
```

The possible advantage of this over yet more dunder methods is that one can
fine-tune the extent to which something has to mimic an array properly
(e.g., run `asanyarray` only if `shape` is not present).

It would be nice, though, if we could end up with also option 4 being
available, if only because code that just can assume ndarray will be
easiest to read.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding to the non-dispatched implementation of NumPy methods

2019-04-23 Thread Marten van Kerkwijk
Hi All,

Reading the discussion again, I've gotten somewhat unsure that it is
helpful to formalize a way to call an implementation that we can and
hopefully will change. Why not just leave it at __wrapped__? I think the
name is no worse and it is more obvious that one relies on something
private.

I ask in part since I could see a good case for having a special method
that is available only for functions that do no explicit casting to array,
i.e., that are ready to accept array mimics (and for which we're willing to
guarantee that would not change). For instance, functions like np.sinc that
really have no business using more than ufuncs under the hood, i.e., which
any class that has __array_ufunc__ can call safely. Or (eventually) all
those functions that just end up calling `concatenate` - those again could
easily be made safe for a class that just overrides `np.concatenate` using
__array_function__. In essence, this would be any function that does not do
`np.as(any)array` but just relies on array attributes.

But the above obviously presumes a vision of where this is headed, which
I'm not sure is shared...

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] grant proposal for core scientific Python projects (rejected)

2019-04-18 Thread Marten van Kerkwijk
Very much second Joe's recommendations - especially trying NASA - which has
an amazing track record of open data also in astronomy (and a history of
open source analysis tools, as well as the "Astrophysics Data System").
-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] grant proposal for core scientific Python projects (rejected)

2019-04-18 Thread Marten van Kerkwijk
Hi Ralf,

I'm sorry to hear the proposal did not pass the first round, but, having
looked at it briefly (about as much time as I would have spent had I been
on the panel), I have to admit I am not surprised: it is nice but nice is
not enough for a competition like this.

Compared to what will have included some really exciting, novel proposals,
most damning will likely have been the modest, incremental goals (for a
large sum of money): performance improvements, but without any actual sense
of what would now become solvable (how does it beat throwing more computers
at a problem, which is cheap?); better implementations of things that exist
(arrays with units, sparse arrays); better GPU support (feels like
something everybody and their brother was excited about a decade ago);
etc.  I also think any panel would expect some concrete examples of
facilities that would now be helped: e.g., how is this going to help LSST
analyze its 20TB/night of data?

Going forward, best may be to explicitly involve the facilities that use
python - within astronomy, that would include LIGO and LSST, but certainly
also STScI (and other NASA institutes), which actually supports SPE
already. It would be good especially to show how much money it would save
them when this is implemented, so that it becomes clear this is a net win.
Indeed, for any future proposal, I'd suggest to involve (or at least ask
for advice) some more senior people who have been successful before (within
astronomy, the likes of Steve Kahn, the LSST director; he was at Columbia
for most of his career, so there is a connection).

All best wishes,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding to the non-dispatched implementation of NumPy methods

2019-04-16 Thread Marten van Kerkwijk
I somewhat share Nathaniel's worry that by providing
`__numpy_implementation__` we essentially get stuck with the
implementations we have currently, rather than having the hoped-for freedom
to remove all the `np.asarray` coercion. In that respect, an advantage of
using `_wrapped` is that it is clearly a private method, so anybody is
automatically forewarned that this can change.

In principle, ndarray.__array_function__ would be more logical, but as
noted in the PR, the problem is that it is non-trivial for a regular
__array_function__ implementation to coerce all the arguments to ndarray
itself.

Which suggests that perhaps what is missing is a general routine that does
that, i.e., that re-uses the dispatcher.

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Behaviour of copy for structured dtypes with gaps

2019-04-12 Thread Marten van Kerkwijk
It may be relevant at this point to mention that the padding bytes do *not*
get copied - so you get a blob with possibly quite a lot of uninitialized
data. If anything, that seems a recipe for unexpected results. Are there
non-contrived examples where you would *want* this uninitialized blob?

Francecz: looking at the PyTables issue, it seems to have been more about
not automatically removing padding (or at least having the option of not
doing so), rather than what the behaviour of a copy should be. I'm
definitely *not* suggesting that one shouldn't be able to add padding: that
would still be as simple as `array.astype(padded_dtype)`. Although in your
case that would still lead to hdf5 files that are all different, since the
padding is uninitialized memory. Which seems far from nice...

Indeed, the "astype" example suggests perhaps a different way to phrase the
issue: should be copy behave as `astype(unpadded_dtype, copy=True)` or
should it just be `astype(same_dtype, copy=True)`. Note that the former is
tricky to do -- have to get the unpadded dtype -- which the latter is easy.

My sense still is that the option that will be the least surprising to most
users is a copy that is most compact and does not have any padding. For
instance, think of someone loading a large binary file as a numpy memmap:
if they take one item, they can have a regular array with `mm['a'].copy()`,
but if they take two, `mm[['a', 'z']].copy()`, oops, out of memory! (Also,
remember that this used to work; that the user gets out of memory is I
think a serious regression, so the pain better be worth it!.)

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Complex Normal Generator in NEP-19 extension

2019-03-31 Thread Marten van Kerkwijk
Certainly have done `np.random.normal(2*n).view('c16')` very often. Makes
sense to just allow it to be generated directly. -- Marten

On Sat, Mar 30, 2019 at 6:24 PM Hameer Abbasi 
wrote:

> On Friday, Mar 29, 2019 at 6:03 PM, Hameer Abbasi <
> einstein.edi...@gmail.com> wrote:
>
> On Friday, Mar 29, 2019 at 6:01 PM, Kevin Sheppard <
> kevin.k.shepp...@gmail.com> wrote:
> One part of moving randomgen closer to fulfilling NEP-19 is rationalizing
> the API, especially new features not in RandomState. Matti Picus has made a
> lot of progress in getting it integrated, especially the part of replacing
> RandomState shimed version of the new generator.
>
> There is only one new method in the generator, a scalar generator for
> complex normals. It is scalar in the sense that it is the complex version
> of np.random.normal, and so supports broadcasting.
>
> This was written based on some GH comments.  This would be a new API and
> so it needs to come here first to see if there is any support.
>
> If there is support, then it will appear in the new RandomGenerator, but
> not the RandomState replacement. If not, then we can just delete it.
>
> Kevin
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> +1
>
> Best Regards,
> Hameer Abbasi
>
>
> To expand on this, the Complex normal distribution is pretty common in
> communications, control, signals and systems, and so on. :) It’d be a great
> add.
>
> Best Regards,
> Hameer Abbasi
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP-18 comment

2019-03-07 Thread Marten van Kerkwijk
Hi Frédéric,

The problem with any environment type variable is that when you disable the
dispatch functionality, all other classes that rely on being able to
override a numpy function stop working as well, i.e., the behaviour of
everything from dask to astropy's Quantity would depend on that setting.

As another alternative, in any call where you *know* that the input is pure
ndarray, you can already call `np..__wrapped__(...)` to call the
original code. This is *not* guaranteed to remain in place between
versions, but obviously for cases that the speed really matters, one can
use it.

Personally, though, I think it would make most sense to try to ensure the
functions are as fast as possible including the dispatcher check, since
that benefits all. My sense is that for quite a number of numpy functions,
there is lower hanging fruit...

All the best,

Marten

On Thu, Mar 7, 2019 at 1:25 PM Frederic Bastien  wrote:

> I see speed changes vs behavior changes as different category of changes
> in my mind.
>
> I understand that now importing library can slow down NumPy for small
> arrays.
> But I have the impression you tell this can also give behavior change.
>
> I do not understand why this could happen. A pure numpy script, that you
> just import dask or other library without using them, would just cause a
> slowdown. Not a behavior change.
>
> Behavior change's start to happen only when you start to use the new
> library.
>
> Did I miss something?
>
> Frédéric
>
> -Original Message-
> From: NumPy-Discussion  nvidia@python.org> On Behalf Of Stefan van der Walt
> Sent: Thursday, March 7, 2019 12:15 PM
> To: Discussion of Numerical Python 
> Cc: Matthew Rocklin 
> Subject: Re: [Numpy-discussion] NEP-18 comment
>
> Hi Sebastian, Frederic,
>
> On Thu, 07 Mar 2019 14:23:10 +, Frederic Bastien wrote:
> > I like your idea Sebastian. This way it is enabled only when needed and
> it is invisible to the user at the same time.
> >
> > Stefan, does it solve well enough the potential problem you raised?
>
> I don't think so.  This means that NumPy suddenly behaves differently when
> dask is imported, which again causes the problem mentioned earlier:
> that identical NumPy code could behave differently depending on library
> versions, imports, and the environment.
>
> That said, I think this is a better solution than an environment variable.
>
> Anyway, my opinion is just one of many: I'd like to hear what the other
> developers think.
>
> Best regards,
> Stéfan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> ---
> This email message is for the sole use of the intended recipient(s) and
> may contain
> confidential information.  Any unauthorized review, use, disclosure or
> distribution
> is prohibited.  If you are not the intended recipient, please contact the
> sender by
> reply email and destroy all copies of the original message.
>
> ---
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Outreachy internship

2019-02-27 Thread Marten van Kerkwijk
Fantastic!

-- Marten

On Wed, Feb 27, 2019 at 1:19 AM Stefan van der Walt 
wrote:

> Hi everyone,
>
> The team at BIDS would like to take on an intern from Outreachy
> (https://www.outreachy.org), as part of our effort to grow the NumPy
> developer community.
>
> The internship is similar to a GSoC, except that we will pay their
> salary, as well as do the mentoring.  We wanted to check in beforehand
> to ensure that everyone is comfortable with us doing this on behalf of
> the NumPy community.
>
> Your thoughts and recommendations are welcome!
>
> Best regards,
> Stéfan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding more detailed exception types to numpy

2019-01-04 Thread Marten van Kerkwijk
Since numpy generally does not expose parts as modules, I think a separate
namespace for the exceptions makes sense. I prefer `np.exceptions` over
`np.errors`.

It might still make sense for that namespace to import from the different
parts, i.e., also have `np.core.exceptions`, np.polynomial.exceptions`,
etc., in part because the ones in `core` will have C helper functions. But
that is a bit of an implementation detail.

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] three-dim array

2018-12-27 Thread Marten van Kerkwijk
There is a long-standing request to require an explicit opt-in for
dtype=object: https://github.com/numpy/numpy/issues/5353

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Warn or immidiately change readonly flag on broadcast_arrays return value?

2018-12-25 Thread Marten van Kerkwijk
Hi Juan,

I also use `broadcast_to` a lot, to save memory, but definitely have been
in a situation where in another piece of code the array is assumed to be
normal and writable (typically, that other piece was also written by me; so
much for awareness...). Fortunately, `broadcast_to` already returns
read-only views (it is newer and was designed as such from the start), so I
was alerted.

At the time `broadcast_to` was introduced [1], there was a discussion about
`broadcast_arrays` as well, and it was decided to leave it writable - see
the note at
https://github.com/numpy/numpy/blob/master/numpy/lib/stride_tricks.py#L265.
The same note suggests a deprecation cycle, but like Hameer I don't really
see how that helps - there is no way to avoid the warning in the large
majority of cases where readonly views are exactly what the user wanted in
the first place. So, that argues for cold turkey...

The one alternative would be to actually warn only if one attempts to write
to a broadcast array - apparently this has been done for `np.diag` as well
[2], so code to do so supposedly exists.

All the best,

Marten

[1] https://github.com/numpy/numpy/pull/5371
[2] https://github.com/numpy/numpy/pull/5371#issuecomment-75238827
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Vectorized version of numpy.linspace

2018-11-14 Thread Marten van Kerkwijk
I see the logic in having the linear space be last, but one non-negligible
advantage of the default being the first axis is that whatever is produced
broadcasts properly against start and stop.
-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Vectorized version of numpy.linspace

2018-11-14 Thread Marten van Kerkwijk
On Wed, Nov 14, 2018 at 1:21 PM Lars Grüter  wrote:


>
> This reminds me of a function [1] I wrote which I think has a lot of
> similarities to what Stephan describes here. It is currently part of a
> PR to rewrite numpy.pad [2].
>
> If we start to write the equivalent internally, then perhaps we should
indeed expose it!
Would my trial indeed help, or is it insufficient? (Sorry for not checking
myself in detail; it is not immediately obvious.)
-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Vectorized version of numpy.linspace

2018-11-14 Thread Marten van Kerkwijk
Code being better than words: see https://github.com/numpy/numpy/pull/12388
for an implementation. The change in the code proper is very small, though
it is worrying that it causes two rather unrelated tests too fail (even if
arguably both tests were wrong).

Note that this does not give flexibility to put the axis where one wants;
as written, the code made putting it at the start the obvious solution, as
it avoids doing anything with the shapes of start and stop.

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Vectorized version of numpy.linspace

2018-11-14 Thread Marten van Kerkwijk
Just to add: nothing conceptually is strange for start and end to be
arrays. Indeed, the code would work for arrays as is if it didn't check the
`step == 0` case to handle denormals (arguably an even less common case
than array-like inputs...), and made a trivial change to get the new axis
to be at the start.

(I guess I'm agreeing with Matthew here that if conceptually things make
sense for arrays, then it should just work; there are of course
counterexample, such as `np.arange`, for which array-valued input makes
even less sense than allowing float and complex).
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-11 Thread Marten van Kerkwijk
Hi Eric,

Thanks very much for the detailed response; it is good to be reminded that
`MaskedArray` is used in a package that, indeed, (nearly?) all of us use!

But I do think that those of us who have been trying to change MaskedArray,
are generally good at making sure the tests continue to pass, i.e., that
the behaviour does not change (the main exception in the last few years was
that views should be taken of masks too, not just the data).

I also think that between __array_ufunc__ and __array_function__, it has
become quite easy to ensure that one no longer has to rely on `np.ma`
functions, i.e., that the regular numpy functions will do the right thing.
But it will need work to actually implement that.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-10 Thread Marten van Kerkwijk
On Sat, Nov 10, 2018 at 5:39 PM Stephan Hoyer  wrote:

> On Sat, Nov 10, 2018 at 2:22 PM Hameer Abbasi 
> wrote:
>
>> To summarize, I think these are our options:
>>
>> 1. Change the behavior of np.anyarray() to check for an __anyarray__()
>> protocol. Change np.matrix.__anyarray__() to return a base numpy array
>> (this is a minor backwards compatibility break, but probably for the best).
>> Start issuing a FutureWarning for any MaskedArray operations that violate
>> Liskov and add a skipna argument that in the future will default to
>> skipna=False.
>>
>> 2. Introduce a new coercion function, e.g., np.duckarray(). This is the
>> easiest option because we don't need to cleanup NumPy's existing ndarray
>> subclasses.
>>
>>
>> My vote is still for 1. I don’t have an issue for PyData/Sparse depending
>> on recent-ish NumPy versions — It’ll need a lot of the recent protocols
>> anyway, although I could be convinced otherwise if major package devs
>> (scikits, SciPy, Dask) were to weigh in and say they’ll jump on it (which
>> seems unlikely given SciPy’s policy to support old NumPy versions).
>>
>
> I agree that option (1) is fine for PyData/sparse. The bigger issue is
> that this change should be conditional on making breaking changes (at least
> raising FutureWarning for now) to np.ma.MaskedArray.
>

Might be good to try before worrying too much - MaskedArray already
overrides *a lot*; it is not at all obvious to me that things wouldn't
"just work" if we bulk-replaced `asarray` with `asanyarray`.  And with
`__array_function__` we now have the option to fix code paths that do not
work immediately.

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should unique types of all arguments be passed on in __array_function__?

2018-11-10 Thread Marten van Kerkwijk
> More broadly, it is only necessary to reject an argument type at the
> __array_function__ level if it defines __array_function__ itself, because
> that’s the only case where it would make a difference to return
> NotImplemented rather than trying (and failing) to call the overriden
> function implementation.
>

Yes, this makes sense -- these are the only types that could possibly
change the outcome if the class now called fails to produce a result.
Indeed, that reasoning makes it logical that `ndarray` itself is not
present even though it defines `__array_ufunc__` - we know it cannot handle
anything with a `__array_ufunc__` implementation.

Hameer, is Stephan's argument convincing to you too? If so, I'll close the
PR.

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-10 Thread Marten van Kerkwijk
Hi Hameer,

I do not think we should change `asanyarray` itself to special-case matrix;
rather, we could start converting `asarray` to `asanyarray` and solve the
problems that produces for matrices in `matrix` itself (e.g., by overriding
the relevant function with `__array_function__`).

I think the idea of providing an `__anyarray__` method (in analogy with
`__array__`) might work. Indeed, the default in `ndarray` (and thus all its
subclasses) could be to let it return `self`  and to override it for
`matrix` to return an ndarray view.

All the best,

Marten

p.s. Note that we are already giving PendingDeprecationWarning for matrix;
https://github.com/numpy/numpy/pull/10142.



On Sat, Nov 10, 2018 at 11:02 AM Matti Picus  wrote:

> On 9/11/18 5:09 pm, Nathaniel Smith wrote:
> > On Fri, Nov 9, 2018 at 4:59 PM, Stephan Hoyer  wrote:
> >> On Fri, Nov 9, 2018 at 6:46 PM Nathaniel Smith  wrote:
> >>> But matrix isn't the only problem with asanyarray. np.ma also violates
> >>> Liskov. No doubt there are other problematic ndarray subclasses out
> >>> there too...
> >>>
> >>>
> >>> Please forgive my ignorance (I don't really use mask arrays), but how
> >>> specifically do masked arrays violate Liskov? In most cases shouldn't
> they
> >>> work the same as base numpy arrays, except with operations keeping
> track of
> >>> masks?
> > Since many operations silently skip over masked values, the
> > computation semantics are different. For example, in a regular array,
> > sum()/size() == mean(), but with a masked array these are totally
> > different operations. So if you have code that was written for regular
> > arrays, but pass in a masked array, there's a solid chance that it
> > will silently return nonsensical results.
> >
> > (This is why it's better for NAs to propagate by default.)
> >
> > -n
>
>
> Echos of the discussions in neps 12, 24, 25, 26. http://www.numpy.org/neps
>
>
> Matti
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Prep for NumPy 1.16.0 branch

2018-11-05 Thread Marten van Kerkwijk
For astropy, we also waiting a little before having a rip-python2-out
fiesta.

I think it is worth trying to get matmul in 1.16, independently of
__array_function__ - it really belongs to ufunc overwrites and all the
groundwork has been done.

For __array_function__, is it at all an option to go to the
disable-by-default step? I.e., by default have array_function_dispatch just
return the implementation instead of wrapping it? Though perhaps reversion
is indeed cleaner; most people who would like to play with it are quite
able to install the development version...

-- Marten

On Sun, Nov 4, 2018 at 10:43 PM Mark Harfouche 
wrote:

> > Thoughts on how to proceed are welcome.
>
> I've been involved in scikit-image and that project  tore out the python2
> only code rather quickly after 2.7 support was dropped. I think it caused a
> few hiccups when backporting bugfixes. I imagine that `1.16.1` and `1.16.2`
> releases will come out quickly and as such, I think removing `if else`
> statements for python2 immediately after `1.16` is released will cause
> annoyances in the first few months bugs are being ironed out.
>
> My 2cents.
>
> On Sun, Nov 4, 2018 at 10:04 PM Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>> On Sun, Nov 4, 2018 at 6:16 PM Stephan Hoyer  wrote:
>>
>>> On Sun, Nov 4, 2018 at 10:32 AM Marten van Kerkwijk <
>>> m.h.vankerkw...@gmail.com> wrote:
>>>
>>>> Hi Chuck,
>>>>
>>>> For `__array_function__`, there was some discussion in
>>>> https://github.com/numpy/numpy/issues/12225 that for 1.16 we might
>>>> want to follow after all Nathaniel's suggestion of using an environment
>>>> variable or so to opt in (since introspection breaks on python2 with our
>>>> wrapped implementations).  Given also the possibly significant hit in
>>>> performance, this may be the best option.
>>>> All the best,
>>>>
>>>> Marten
>>>>
>>>
>>> I am also leaning towards this right now, depending on how long we plan
>>> to wait for releasing 1.16. It will take us at least a little while to sort
>>> out performance issues for __array_function__, I'd guess at least a few
>>> weeks. Then a blocker still might turn up during the release candidate
>>> process (though I think we've found most of the major bugs / downstream
>>> issues already through tests on NumPy's dev branch).
>>>
>>
>> My tentative schedule is to branch in about two weeks, then allow 2 weeks
>> of testing for rc1, possibly another two weeks for rc2, and then a final.
>> so possibly about six weeks to final release. That leaves 2 to 4 weeks of
>> slack before 2019.
>>
>>
>>> Overall, it does feels a little misguided to rush in a change as
>>> pervasive as __array_function__ for a long term support release. If we
>>> exclude __array_function__ I expect the whole release process for 1.16
>>> would go much smoother. We might even try to get 1.17 out faster than
>>> usual, so we can minimize the number of additional changes besides
>>> __array_function__ and going Python 3 only -- that's already a good bit of
>>> change.
>>>
>>
>> I would like to get 1.17 out a bit early. I'm not sure how many backwards
>> incompatible changes we want to have in the first post python2 release. My
>> initial thoughts are to drop Python 2.7 testing, go to C99, and get the new
>> fft in. Beyond that, I'm hesitant to start tearing out all the Python2
>> special casing in the first new release, although that could certainly be
>> the main task for 1.17 and would clean up the code considerably. It might
>> also be a good time to catch up on changing deprecations to errors.
>> Thoughts on how to proceed are welcome.
>>
>>
>>> Note that if we make this change (reverting __array_function__), we'll
>>> need to revisit where we put a few deprecation warnings -- these will need
>>> to be restored into function bodies, not their dispatcher functions.
>>>
>>> Also: it would be really nice if we get matmul-as-ufunc in before (or at
>>> the same time) as __array_function__, so we have a complete story about it
>>> being possible to override everything in NumPy. This is another argument
>>> for delaying __array_function__, if matmul-as-ufunc can't make it in time
>>> for 1.16.
>>>
>>
>> That's two votes for matmul-as-ufunc. How much would it cost to simply
>> make __array_function__ a nop?
>>
>> Chuck
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] out parameter for np.fromfile

2018-11-05 Thread Marten van Kerkwijk
Hi Mark,

Having an `out` might make sense. With present numpy, if you are really
dealing with a file or file-like object, you might consider using
`np.memmap` to access the data more directly. If it is something that looks
more like a buffer, `np.frombuffer` may be useful (that doesn't copy data,
but points the array at the memory that holds the buffer).

All the best,

Marten


On Sun, Nov 4, 2018 at 10:35 PM Mark Harfouche 
wrote:

> I was wondering what would your thoughts be on adding an output parameter
> to np.fromfile?
>
> The advantage would be when interfacing with executables like ffmpeg
> which are arguably easier to use by calling them as a subprocess compared
> to a shared library in python.
>
> Having the output parameter in np.fromfile would enable pre-allocation of
> large arrays that are reused during the computation of new image frames
> when decoding large video files.
>
> Thoughts are appreciated!
>
> Mark
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should unique types of all arguments be passed on in __array_function__?

2018-11-05 Thread Marten van Kerkwijk
Hi Stephan,

Another part of your reply worth considering, though slightly off topic for
the question here, of what to pass on in `types`:


On Sun, Nov 4, 2018 at 7:51 PM Stephan Hoyer  wrote:

> On Sun, Nov 4, 2018 at 8:03 AM Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>> I thought of this partially as I was wondering how an implementation for
>> ndarray itself would look like. For that, it is definitely useful to know
>> all unique types, since if it is only ndarray, no casting whatsoever needs
>> to be done, while if there are integers, lists, etc, an attempt has to be
>> made to turn these into arrays
>>
>
> OK, so hypothetically we could invoke versions of each the numpy function
> that doesn't call `as[any]array`, and this would slightly speed-up
> subclasses that call super().__array_function__?
>
> A longer-term goal that I had in mind here was generally for the
implementations to just be able to assume their arguments are ndarray,
i.e., be free to assume there is a shape, dtype, etc. That is not
specifically useful for subclasses; for pure python code, it might also
mean array mimics could happily use the implementation.  But perhaps more
importantly, the code would become substantially cleaner.

Anyway, really a longer-term goal...

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Implementations of ndarray.__array_function__ (and ndarray.__array_ufunc__)

2018-11-05 Thread Marten van Kerkwijk
On Sun, Nov 4, 2018 at 8:57 PM Stephan Hoyer  wrote:

> On Sun, Nov 4, 2018 at 8:45 AM Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>> Does the above make sense? I realize that the same would be true for
>> `__array_ufunc__`, though there the situation is slightly trickier since it
>> is not as easy to bypass any further override checks. Nevertheless, it does
>> seem like it would be correct to do the same there. (And if we agree this
>> is the case, I'd quite happily implement it -- with the merger of
>> multiarray and umath it has become much easier to do.)
>>
>
> Marten actually implemented a draft version of this already in
> https://github.com/numpy/numpy/pull/12328 :). I found reading over the PR
> helpful for understand this proposal.
>
> I guess the practical import of this change is that it makes it (much?)
> easier to write __array_function__ for ndarray subclasses: if there's a
> function where NumPy's default function works fine, you don't need to
> bother with returning anything other than NotImplemented from
> __array_function__. It's sort of like NotImplementedButCoercible, but only
> for ndarray subclasses.
>

Yes, return NotImplemented if there is another array, or, even simpler,
just call super.

Note that it is not quite like `NotImplementedButCoercible`, since no
actual coercion to ndarray would necessarily be needed - with adherence to
the Liskov substitution principle, the subclass might stay intact (if only
partially initialized).


> One minor downside is that this might make it harder to eventually
> deprecate and/or contemplate removing checks for 'mean' methods in
> functions like np.mean(), because __array_function__ implementers might
> still be relying on this.
>

I think this is somewhat minor indeed, since we can (and should) insist
that subclasses here properly behave as subclasses, so if an
ndarray-specific implementation breaks a subclass, that might well indicate
that the subclass is not quite good enough (and we can now point out there
is a way to override the function). It might also indicate that the code
itself could be better - that would be a win.

But so far, I think this makes sense.
>
> The PR includes additional changes to np.core.overrides, but I'm not sure
> if those are actually required here (or rather only possible due to this
> change). I guess they are needed if you want to be able to count on
> ndarray.__array_function__ being called after subclass __array_function__
> methods.
>

It is mostly a transfer of functionality from `get_override_types_and_args`
to the place where the implementation is decided upon. Perhaps more logical
even if we do not pursue this.

>
> I'm not sure I like this part: it means that ndarray.__array_function__
> actually gets called when other arguments implement __array_function__. For
> interactions with objects that aren't ndarray subclasses this is entirely
> pointless and would unnecessarily slow things down, since
> ndarray._array_function__ will always return NotImplemented.
>

Agreed here. I did in fact think about it, but wasn't sure (and didn't have
time to think how to check) that the gain in time for cases where an
ndarray comes before the relevant array mimic (and there thus a needless
call to ndarray.__array_function__ can be prevented) was worth it compared
to the cost of attempting to do the removal for cases where the array mimic
came first or where there was no regular ndarray in the first place. But I
think this is an implementation detail; for now, let me add a note to the
PR about it.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should unique types of all arguments be passed on in __array_function__?

2018-11-05 Thread Marten van Kerkwijk
More specifically:

Should we change this? It is quite trivially done, but perhaps I am missing
>> a reason for omitting the non-override types.
>>
>
> Realistically, without these other changes in NumPy, how would this
> improve code using __array_function__? From a general purpose dispatching
> perspective, are there cases where you'd want to return NotImplemented
> based on types that don't implement __array_function__?
>

I think, yes, that would be the closest analogy to the python operators.
Saves you from having separate cases for types that have and do not have
`__array_function__`.


> I guess this might help if your alternative array class is super-explicit,
> and doesn't automatically call `asmyarray()` on each argument. You could
> rely on __array_function__ to return NotImplement (and thus raise
> TypeError) rather than type checking in every function you write for your
> alternative arrays.
>

Indeed.


> One minor downside would speed: now __array_function__ implementations
> need to check a longer list of types.
>

That's true.

>
> Another minor downside: if users follow the example of
> NDArrayOperatorsMixin docstring, they would now need to explicitly list all
> of the scalar types (without __array_function__) that they support,
> including builtin types like int and type(None). I suppose this ties into
> our recommended best practices for doing type checking in
> __array_ufunc__/__array_function__ implementations, which should probably
> be updated regardless:
> https://github.com/numpy/numpy/issues/12258#issuecomment-432858949
>
>
Also true. It makes me wonder again whether passing on the types is useful
at all... But I end up thinking that it is not up to an implementation to
raise TypeError - it should just return NotImplemented.

If we'd wanted to give more information, we might also consider passing on
`overloaded_args` - then perhaps one has the best of both worlds.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should unique types of all arguments be passed on in __array_function__?

2018-11-05 Thread Marten van Kerkwijk
Hi Stephan,

I fear my example about thinking about `ndarray.__array_function__`
distracted from the gist of my question, which was whether for
`__array_function__` implementations *generally* it wouldn't be handier to
have all unique types rather than just those that override
`__array_function__`. It would seem that for any other implementation than
for numpy itself, the presence of __array_function__ is indeed almost
irrelevant. As a somewhat random example, why would it, e.g., for DASK be
useful to know that another argument is a Quantity, but not that it is a
file handle? (Presumably, it cannot handle either...)

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Prep for NumPy 1.16.0 branch

2018-11-04 Thread Marten van Kerkwijk
Hi Chuck,

For `__array_function__`, there was some discussion in
https://github.com/numpy/numpy/issues/12225 that for 1.16 we might want to
follow after all Nathaniel's suggestion of using an environment variable or
so to opt in (since introspection breaks on python2 with our wrapped
implementations).  Given also the possibly significant hit in performance,
this may be the best option.
All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Implementations of ndarray.__array_function__ (and ndarray.__array_ufunc__)

2018-11-04 Thread Marten van Kerkwijk
Hi again,

Another thought about __array_function__, this time about the
implementation for ndarray. In it, we currently check whether any of the
types define a (different) __array_function__, and, if so, give up. This
seems too strict: I think that, at least in principle, subclasses should be
allowed through even if they override __array_function__.

This thought was triggered by Travis pointing to the Liskov substitution
principle [1], that code written for a given type should just work on a
(properly written) subclass. This suggests `ndarray` should not exclude
subclasses even if they override __array_function__, since if the subclass
does not work that way, it can already ensure an error is raised since it
knows it is called first.

Indeed, this is also how python itself works: if, eg., I subclass list as
follows:
```
class MyList(list):
def __radd__(self, other):
return NotImplemented
```
then any `list + mylist` will just concatenate the lists, even though
`MyList.__radd__` explicitly tells it cannot do it (it returning
`NotImplemented` means that  `list.__add__` gets a change).

The reason that we do not already follow this logic may be that currently
`ndarray.__array_function__` ends by calling the public function, which
will lead to infinite recursion if there is a subclass that overrides
__array_function__ and returns NotImplemented. However, inside
ndarray.__array_function__, there is no real reason to call the public
function - one might as well just call the implementation, in which case
this is not a problem.

Does the above make sense? I realize that the same would be true for
`__array_ufunc__`, though there the situation is slightly trickier since it
is not as easy to bypass any further override checks. Nevertheless, it does
seem like it would be correct to do the same there. (And if we agree this
is the case, I'd quite happily implement it -- with the merger of
multiarray and umath it has become much easier to do.)

All the best,

Marten

[1] https://en.wikipedia.org/wiki/Liskov_substitution_principle
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Should unique types of all arguments be passed on in __array_function__?

2018-11-04 Thread Marten van Kerkwijk
Hi All,

While thinking about implementations using __array_function__, I wondered
whether the "types" argument passed on is not defined too narrowly.
Currently, it only contains the types of arguments that provide
__array_ufunc__, but  wouldn't it make more sense to provide the unique
types of all arguments, independently of whether those types have defined
__array_ufunc__? It would seem quite useful for any override to know, e.g.,
whether a string or an integer is passed on.

I thought of this partially as I was wondering how an implementation for
ndarray itself would look like. For that, it is definitely useful to know
all unique types, since if it is only ndarray, no casting whatsoever needs
to be done, while if there are integers, lists, etc, an attempt has to be
made to turn these into arrays (i.e., the `as[any]array` calls currently
present in the implementations, which really more logically are part of
`ndarray.__array_function__` dispatch).

Should we change this? It is quite trivially done, but perhaps I am missing
a reason for omitting the non-override types.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] asanyarray vs. asarray

2018-11-01 Thread Marten van Kerkwijk
The substitution principle is interesting (and, being trained as an
astronomer, not a computer scientist, I had not heard of it before). I
think matrix is indeed obviously wrong here (with indexing being more
annoying, but multiplication being a good example as well).

Perhaps more interesting as an example to consider is MaskedArray, which is
much closer to a sensible subclass, though different from Quantity in that
what is masked can itself be an ndarray subclass. In a sense, it is more of
a container class, in which the operations are done on what is inside it,
with some care taken about which elements are fixed. This becomes quite
clear when one thinks of implementing __array_ufunc__ or
__array_function__: for Quantity, calling super after dealing with the
units is very logical, for MaskedArray, it makes more sense to call the
(universal) function again on the contents [1].

For this particular class, if reimplemented, it might make most sense as a
"mixin" since its attributes depend both on the masked class (.mask, etc.)
and on what is being masked (say, .unit for a quantity). Thus, the final
class might be an auto-generated new class (e.g.,
MaskedQuantity(MaskedArray, Quantity)). We have just added a new
Distribution class to astropy which is based on this idea [2] (since this
uses casting from structured dtypes which hold the samples to real arrays
on which functions are evaluated, this probably could be done just as well
or better with more flexible dtypes, but we have to deal with what's
available in the real world, not the ideal one...).

-- Marten

[1]
http://www.numpy.org/neps/nep-0013-ufunc-overrides.html#subclass-hierarchies
[2] https://github.com/astropy/astropy/pull/6945
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


  1   2   >