Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-14 Thread Marten van Kerkwijk
Hi Ralf,

Thanks for the clarification. I think in your terms the bottom line was
that I thought we had a design B for the case where a function was really
"just a ufunc". But the nanfunctions show that even if logically they are a
ufunc (which admittedly uses another ufunc or two for `where`), it is
tricky, certainly trickier than I thought. And this discussion has served
to clarify that even for other "simpler" functions it can get similarly
tricky.

Anyway, bottom line is that I think you are right in actually needed a more
proper discussion/design about how to move forward.

All the best,

Marten

p.s. And, yes, `__array_function__` is quite wonderful!

On Fri, Jun 14, 2019 at 3:46 AM Ralf Gommers  wrote:

>
>
> On Fri, Jun 14, 2019 at 2:21 AM Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>> Hi Ralf,
>>
>> Thanks both for the reply and sharing the link. I recognize much (from
>> both sides!).
>>
>> 
>>
>>>
>>> More importantly, I think we should not even consider *discussing*
>>> removing` __array_function__` from np.isposinf (or any similar one off
>>> situation) before there's a new bigger picture design. This is not about
>>> "deprecation is hard", this is about doing things with purpose rather than
>>> ad-hoc, as well as recognizing that lots of small changes are a major drain
>>> on our limited maintainer resources. About the latter issue I wrote a blog
>>> post recently, perhaps that clarifies the previous sentence a bit and gives
>>> some more insight in my perspective:
>>> https://rgommers.github.io/2019/06/the-cost-of-an-open-source-contribution/
>>>
>>> Yes, I definitely did not mean to imply that a goal was to change just
>> `isposinf`, `sum`, or `nansum` (the goal of the PR that started this thread
>> was to clean up the whole `nanfunctions` module). Rather, to use them as
>> examples to see what policy there actually is or should be; and I do worry
>> that with __array_function__, however happy I am that it now exists
>> (finally, Quantity can be concatenated!), we're going the Microsoft route
>> of just layering on top of old code if even for the simplest cases there is
>> no obvious path for how to remove it.
>>
>
> I share that worry to some extent (an ever-growing collection of
> protocols). To be fair, we knew that __array_function__ wasn't perfect, but
> I think Stephan did a really good job making the case for why it was
> necessary, and that we needed to get something in place in 6-12 months
> rather than spending years on a more polished/comprehensive design. Given
> those constraints, we seem to have made a good choice that has largely
> achieved its goals.
>
> I'm still not sure you got my main objection. So I'll try once more.
> We now have a design, imperfect but it exists and is documented (to some
> extent). Let's call it design A. Now you're wanting clarity on a policy on
> how to move from design A to design B. However, what B even is isn't
> spelled out, although we can derive rough outlines from mailing list
> threads like these (it involves better handling of subclasses, allowing
> reuse of implementation of numpy functions in terms of other numpy
> functions, etc.). The right way forward is:
>
> 1. describe what design B is
> 2. decide if that design is a good idea
> 3. then worry about implementation and a migration policy
>
> Something that's specific to all nan-functions is still way too specific,
> and doesn't justify skipping 1-2 and jumping straight to 3.
>
> I don't know how to express it any clearer than the above in email format.
> If it doesn't make sense to you, it'd be great to have a call to discuss in
> person.
>
> Cheers,
> Ralf
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-14 Thread Ralf Gommers
On Fri, Jun 14, 2019 at 2:21 AM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi Ralf,
>
> Thanks both for the reply and sharing the link. I recognize much (from
> both sides!).
>
> 
>
>>
>> More importantly, I think we should not even consider *discussing*
>> removing` __array_function__` from np.isposinf (or any similar one off
>> situation) before there's a new bigger picture design. This is not about
>> "deprecation is hard", this is about doing things with purpose rather than
>> ad-hoc, as well as recognizing that lots of small changes are a major drain
>> on our limited maintainer resources. About the latter issue I wrote a blog
>> post recently, perhaps that clarifies the previous sentence a bit and gives
>> some more insight in my perspective:
>> https://rgommers.github.io/2019/06/the-cost-of-an-open-source-contribution/
>>
>> Yes, I definitely did not mean to imply that a goal was to change just
> `isposinf`, `sum`, or `nansum` (the goal of the PR that started this thread
> was to clean up the whole `nanfunctions` module). Rather, to use them as
> examples to see what policy there actually is or should be; and I do worry
> that with __array_function__, however happy I am that it now exists
> (finally, Quantity can be concatenated!), we're going the Microsoft route
> of just layering on top of old code if even for the simplest cases there is
> no obvious path for how to remove it.
>

I share that worry to some extent (an ever-growing collection of
protocols). To be fair, we knew that __array_function__ wasn't perfect, but
I think Stephan did a really good job making the case for why it was
necessary, and that we needed to get something in place in 6-12 months
rather than spending years on a more polished/comprehensive design. Given
those constraints, we seem to have made a good choice that has largely
achieved its goals.

I'm still not sure you got my main objection. So I'll try once more.
We now have a design, imperfect but it exists and is documented (to some
extent). Let's call it design A. Now you're wanting clarity on a policy on
how to move from design A to design B. However, what B even is isn't
spelled out, although we can derive rough outlines from mailing list
threads like these (it involves better handling of subclasses, allowing
reuse of implementation of numpy functions in terms of other numpy
functions, etc.). The right way forward is:

1. describe what design B is
2. decide if that design is a good idea
3. then worry about implementation and a migration policy

Something that's specific to all nan-functions is still way too specific,
and doesn't justify skipping 1-2 and jumping straight to 3.

I don't know how to express it any clearer than the above in email format.
If it doesn't make sense to you, it'd be great to have a call to discuss in
person.

Cheers,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-13 Thread Charles R Harris
On Thu, Jun 13, 2019 at 6:21 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi Ralf,
>
> Thanks both for the reply and sharing the link. I recognize much (from
> both sides!).
>
> 
>
>>
>> More importantly, I think we should not even consider *discussing*
>> removing` __array_function__` from np.isposinf (or any similar one off
>> situation) before there's a new bigger picture design. This is not about
>> "deprecation is hard", this is about doing things with purpose rather than
>> ad-hoc, as well as recognizing that lots of small changes are a major drain
>> on our limited maintainer resources. About the latter issue I wrote a blog
>> post recently, perhaps that clarifies the previous sentence a bit and gives
>> some more insight in my perspective:
>> https://rgommers.github.io/2019/06/the-cost-of-an-open-source-contribution/
>>
>> Yes, I definitely did not mean to imply that a goal was to change just
> `isposinf`, `sum`, or `nansum` (the goal of the PR that started this thread
> was to clean up the whole `nanfunctions` module). Rather, to use them as
> examples to see what policy there actually is or should be; and I do worry
> that with __array_function__, however happy I am that it now exists
> (finally, Quantity can be concatenated!), we're going the Microsoft route
> of just layering on top of old code if even for the simplest cases there is
> no obvious path for how to remove it.
>
> Anyway, this discussion is probably gotten beyond the point where much is
> added. I'll close the `nansum` PR.
>
> All the best,
>
> Marten
>
>
> p.s. I would say that deprecations within numpy currently *are* hard.
> E.g., the rate at which we add `DEPRECATED` to the C code is substantially
> larger than that with which we actually remove any long-deprecated
> behaviour.
>
>

That's been true for the last couple of releases, but not historically. The
main problem was lack of time compounded by a  FutureWarning that got stuck
because it broke code in unexpected ways when brought to fruition. For 1.16
I decided that expiring spent deprecations wasn't worth the churn. At this
point I see 1.18 as the release that starts the cleanup, and possibly the
start of removing the Python 2.7 compatibility code.

Speaking of 1.18, the other thing I'd like to see is discussion of a masked
array replacement. I think the basic infrastructure is now in place for
that. Hmm... should probably make a separate post eliciting ideas for 1.18.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-13 Thread Marten van Kerkwijk
Hi Ralf,

Thanks both for the reply and sharing the link. I recognize much (from both
sides!).



>
> More importantly, I think we should not even consider *discussing*
> removing` __array_function__` from np.isposinf (or any similar one off
> situation) before there's a new bigger picture design. This is not about
> "deprecation is hard", this is about doing things with purpose rather than
> ad-hoc, as well as recognizing that lots of small changes are a major drain
> on our limited maintainer resources. About the latter issue I wrote a blog
> post recently, perhaps that clarifies the previous sentence a bit and gives
> some more insight in my perspective:
> https://rgommers.github.io/2019/06/the-cost-of-an-open-source-contribution/
>
> Yes, I definitely did not mean to imply that a goal was to change just
`isposinf`, `sum`, or `nansum` (the goal of the PR that started this thread
was to clean up the whole `nanfunctions` module). Rather, to use them as
examples to see what policy there actually is or should be; and I do worry
that with __array_function__, however happy I am that it now exists
(finally, Quantity can be concatenated!), we're going the Microsoft route
of just layering on top of old code if even for the simplest cases there is
no obvious path for how to remove it.

Anyway, this discussion is probably gotten beyond the point where much is
added. I'll close the `nansum` PR.

All the best,

Marten


p.s. I would say that deprecations within numpy currently *are* hard. E.g.,
the rate at which we add `DEPRECATED` to the C code is substantially larger
than that with which we actually remove any long-deprecated behaviour.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-13 Thread Ralf Gommers
On Thu, Jun 13, 2019 at 9:43 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi Ralf,
>
>
>>> I guess the one immediate question is whether `np.sum` and the like
>>> should be overridden by `__array_function__` at all, given that what should
>>> be the future recommended override already works.
>>>
>>
>> I'm not sure I understand the rationale for this. Design consistency
>> matters. Right now the rule is simple: all ufuncs have __array_ufunc__, and
>> other functions __array_function__. Why keep on tweaking things for little
>> benefit?
>>
>
> I'm mostly trying to understand how we would actually change things.
>

I think we do have ways to deprecate things and reason about how to make
the trade-offs to do so. Function overrides are not a whole lot different I
think, we can apply the same method (plus there's a special bit in NEP 18
that we reserve the right to change functions into ufuncs and use
`__array_ufunc__`).

I guess your quite logical argument for consistency is that it requires
> `np.sum is np.add.reduce`. But to do that, one would have to get rid of the
> `.sum()` method override, and then deprecate using `__array_function__` on
> `np.sum`.
>
> A perhaps clearer example is `np.isposinf`, where the implementation truly
> consists of ufunc calls only (indeed, it is in `lib/ufunc_like`). I guess
> following your consistency logic, one would not remove the
> `__array_function__` override until it actually became a ufunc itself.
>

Correct.

More importantly, I think we should not even consider *discussing*
removing` __array_function__` from np.isposinf (or any similar one off
situation) before there's a new bigger picture design. This is not about
"deprecation is hard", this is about doing things with purpose rather than
ad-hoc, as well as recognizing that lots of small changes are a major drain
on our limited maintainer resources. About the latter issue I wrote a blog
post recently, perhaps that clarifies the previous sentence a bit and gives
some more insight in my perspective:
https://rgommers.github.io/2019/06/the-cost-of-an-open-source-contribution/

Cheers,
Ralf



> Anyway, I think this discussion has been useful, if only in making it yet
> more clear how difficult it is to deprecate anything!
>
> All the best,
>
> Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-13 Thread Marten van Kerkwijk
Hi Ralf,


>> I guess the one immediate question is whether `np.sum` and the like
>> should be overridden by `__array_function__` at all, given that what should
>> be the future recommended override already works.
>>
>
> I'm not sure I understand the rationale for this. Design consistency
> matters. Right now the rule is simple: all ufuncs have __array_ufunc__, and
> other functions __array_function__. Why keep on tweaking things for little
> benefit?
>

I'm mostly trying to understand how we would actually change things. I
guess your quite logical argument for consistency is that it requires
`np.sum is np.add.reduce`. But to do that, one would have to get rid of the
`.sum()` method override, and then deprecate using `__array_function__` on
`np.sum`.

A perhaps clearer example is `np.isposinf`, where the implementation truly
consists of ufunc calls only (indeed, it is in `lib/ufunc_like`). I guess
following your consistency logic, one would not remove the
`__array_function__` override until it actually became a ufunc itself.

Anyway, I think this discussion has been useful, if only in making it yet
more clear how difficult it is to deprecate anything!

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-13 Thread Ralf Gommers
On Thu, Jun 13, 2019 at 7:07 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

>
>
> On Thu, Jun 13, 2019 at 12:46 PM Stephan Hoyer  wrote:
> 
>
>>
>>
>>> But how about `np.sum` itself? Right now, it is overridden by
>>> __array_function__ but classes without __array_function__ support can also
>>> override it through the method lookup and through __array_ufunc__.
>>>
>>> Would/should there be a point where we just have `sum = np.add.reduce`
>>> and drop other overrides? If so, how do we get there?
>>>
>>> One option might be start reversing the order in `_wrapreduction` - try
>>> `__array_ufunc__` if it is defined and only if that fails try the `.sum`
>>> method.
>>>
>>
>> Yes, I think we would need to do this sort of thing. It's a bit of
>> trouble, but probably doable with some decorator magic. It would indeed be
>> nice for sum() to eventually just be np.add.reduce, though to be honest I'm
>> not entirely sure it's worth the trouble of a long deprecation cycle --
>> people have been relying on the fall-back calling of methods for a long
>> time.
>>
>>
> I guess the one immediate question is whether `np.sum` and the like should
> be overridden by `__array_function__` at all, given that what should be the
> future recommended override already works.
>

I'm not sure I understand the rationale for this. Design consistency
matters. Right now the rule is simple: all ufuncs have __array_ufunc__, and
other functions __array_function__. Why keep on tweaking things for little
benefit?

Ralf



> (And, yes, arguably too late to change it now!)
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-13 Thread Marten van Kerkwijk
On Thu, Jun 13, 2019 at 12:46 PM Stephan Hoyer  wrote:


>
>
>> But how about `np.sum` itself? Right now, it is overridden by
>> __array_function__ but classes without __array_function__ support can also
>> override it through the method lookup and through __array_ufunc__.
>>
>> Would/should there be a point where we just have `sum = np.add.reduce`
>> and drop other overrides? If so, how do we get there?
>>
>> One option might be start reversing the order in `_wrapreduction` - try
>> `__array_ufunc__` if it is defined and only if that fails try the `.sum`
>> method.
>>
>
> Yes, I think we would need to do this sort of thing. It's a bit of
> trouble, but probably doable with some decorator magic. It would indeed be
> nice for sum() to eventually just be np.add.reduce, though to be honest I'm
> not entirely sure it's worth the trouble of a long deprecation cycle --
> people have been relying on the fall-back calling of methods for a long
> time.
>
>
I guess the one immediate question is whether `np.sum` and the like should
be overridden by `__array_function__` at all, given that what should be the
future recommended override already works.

(And, yes, arguably too late to change it now!)

-- Marten



>
>
>> All the best,
>>
>> Marten
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-13 Thread Stephan Hoyer
On Thu, Jun 13, 2019 at 9:35 AM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi Ralf, others,
>
>
>>>  Anyway, I guess this is still a good example to consider for how we
>>> should go about getting to a new implementation, ideally with just a
>>> single-way to override?
>>>
>>> Indeed, how do we actually envisage deprecating the use of
>>> `__array_function__` for a given part of the numpy API? Are we allowed to
>>> go cold-turkey if the new implementation is covered by `__array_ufunc__`?
>>>
>>
>> I think __array_function__ is still the best way to do this (that's the
>> only actual override, so most robust and performant likely), so I don't see
>> any reason for a deprecation.
>>
>> Yes, I fear I have to agree for the nan-functions, at least for now...
>
> But how about `np.sum` itself? Right now, it is overridden by
> __array_function__ but classes without __array_function__ support can also
> override it through the method lookup and through __array_ufunc__.
>
> Would/should there be a point where we just have `sum = np.add.reduce` and
> drop other overrides? If so, how do we get there?
>
> One option might be start reversing the order in `_wrapreduction` - try
> `__array_ufunc__` if it is defined and only if that fails try the `.sum`
> method.
>

Yes, I think we would need to do this sort of thing. It's a bit of trouble,
but probably doable with some decorator magic. It would indeed be nice for
sum() to eventually just be np.add.reduce, though to be honest I'm not
entirely sure it's worth the trouble of a long deprecation cycle -- people
have been relying on the fall-back calling of methods for a long time.



> All the best,
>
> Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-13 Thread Marten van Kerkwijk
Hi Ralf, others,


>>  Anyway, I guess this is still a good example to consider for how we
>> should go about getting to a new implementation, ideally with just a
>> single-way to override?
>>
>> Indeed, how do we actually envisage deprecating the use of
>> `__array_function__` for a given part of the numpy API? Are we allowed to
>> go cold-turkey if the new implementation is covered by `__array_ufunc__`?
>>
>
> I think __array_function__ is still the best way to do this (that's the
> only actual override, so most robust and performant likely), so I don't see
> any reason for a deprecation.
>
> Yes, I fear I have to agree for the nan-functions, at least for now...

But how about `np.sum` itself? Right now, it is overridden by
__array_function__ but classes without __array_function__ support can also
override it through the method lookup and through __array_ufunc__.

Would/should there be a point where we just have `sum = np.add.reduce` and
drop other overrides? If so, how do we get there?

One option might be start reversing the order in `_wrapreduction` - try
`__array_ufunc__` if it is defined and only if that fails try the `.sum`
method.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-13 Thread Ralf Gommers
On Thu, Jun 13, 2019 at 2:51 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi All,
>
> `nanfunctions` use `asanyarray` currently, which as Ralf notes scuppers
> the plan to use `asarray` - sorry to have been sloppy with stating they
> "convert to array".
>
> And, yes, I'll admit to forgetting that we are introducing `where` only in
> 1.17 - clearly we cannot rely on other classes to have already implemented
> it!! (This is the problem with having done it myself - it seems ages ago!)
>
> For the old implementation, there was indeed only one way to override
> (`__array_function__`) for non-subclasses, but for the new implementation
> there are three ways, though the first two overlap:
> - support `.sum()` with all its arguments and `__array_ufunc__` for `isnan`
> - support `__array_ufunc__` including reductions (with all their arguments)
> - support `__array_function__`
>
> I think that eventually one would like to move to the case where there is
> only one (`__array_ufunc__` in this case, since really it is just a
> combination of two ufuncs, isnan and add.reduce in this case).
>

I don't think the other ones are really ways of "overriding nansum". They
happen to work, but rely on internal implementation details and then
overriding functions inside those details.

It sounds like you're suggestion a new way of declaring "this is a
canonical implementation that duck arrays can rely on". I can see that
happening, but we probably want some standard way to mark those functions,
have a note in the docs and some tests.


>  Anyway, I guess this is still a good example to consider for how we
> should go about getting to a new implementation, ideally with just a
> single-way to override?
>
> Indeed, how do we actually envisage deprecating the use of
> `__array_function__` for a given part of the numpy API? Are we allowed to
> go cold-turkey if the new implementation is covered by `__array_ufunc__`?
>

I think __array_function__ is still the best way to do this (that's the
only actual override, so most robust and performant likely), so I don't see
any reason for a deprecation.

Cheers,
Ralf



> All the best,
>
> Marten
>
>
>
> On Thu, Jun 13, 2019 at 4:18 AM Ralf Gommers 
> wrote:
>
>>
>>
>> On Thu, Jun 13, 2019 at 3:16 AM Stephan Hoyer  wrote:
>>
>>> On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk <
>>> m.h.vankerkw...@gmail.com> wrote:
>>>
 Hi Ralf,

 You're right, the problem is with the added keyword argument (which
 would appear also if we did not still have to support the old .sum method
 override but just dispatched to __array_ufunc__ with `np.add.reduce` -
 maybe worse given that it seems likely the reduce method has seen much less
 testing in  __array_ufunc__ implementations).

 Still, I do think the question stands: we implement a `nansum` for our
 ndarray class a certain way, and provide ways to override it (three now, in
 fact). Is it really reasonable to expect that we wait 4 versions for other
 packages to keep up with this, and thus get stuck with given internal
 implementations?

>>>
>> In general, I'd say yes. This is a problem of our own making. If an
>> implementation uses np.sum on an argument that can be a subclass or duck
>> array (i.e. we didn't coerce with asarray), then the code is calling out to
>> outside of numpy. At that point we have effectively made something public.
>> We can still change it, but only if we're sure that we don't break things
>> in the process (i.e. we then insert a new asarray, something you're not
>> happy about in general ).
>>
>> I don't get the "three ways to override nansum" by the way. There's only
>> one I think: __array_function__. There may be more for the sum() method.
>>
>> Another way to say the same thing: if a subclass does everything right,
>> and we decide to add a new keyword to some function in 1.17 without
>> considering those subclasses, then turning around straight after and saying
>> "oh those subclasses rely on our old API, it may be okay to either break
>> them or send them a deprecation warning" (which is I think what you were
>> advocating for - your 4 and 3 options) is not reasonable.
>>
>>
 Aside: note that the present version of the nanfunctions relies on
 turning the arguments into arrays and copying 0s into it - that suggests
 that currently they do not work for duck arrays like Dask.

>>>
>> There being an issue with matrix in the PR suggests there is an issue for
>> subclasses at least?
>>
>>
>>> Agreed. We could safely rewrite things to use np.asarray(), without any
>>> need to worry about backends compatibility. From an API perspective,
>>> nothing would change -- we already cast inputs into base numpy arrays
>>> inside the _replace_nan() routine.
>>>
>>
>> In that case yes, nansum can be rewritten. However, it's only because of
>> the use of (as)array - if it were asanyarray that would already scupper the
>> plan.
>>
>> Cheers,
>> Ralf

Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-13 Thread Marten van Kerkwijk
Hi All,

`nanfunctions` use `asanyarray` currently, which as Ralf notes scuppers the
plan to use `asarray` - sorry to have been sloppy with stating they
"convert to array".

And, yes, I'll admit to forgetting that we are introducing `where` only in
1.17 - clearly we cannot rely on other classes to have already implemented
it!! (This is the problem with having done it myself - it seems ages ago!)

For the old implementation, there was indeed only one way to override
(`__array_function__`) for non-subclasses, but for the new implementation
there are three ways, though the first two overlap:
- support `.sum()` with all its arguments and `__array_ufunc__` for `isnan`
- support `__array_ufunc__` including reductions (with all their arguments)
- support `__array_function__`

I think that eventually one would like to move to the case where there is
only one (`__array_ufunc__` in this case, since really it is just a
combination of two ufuncs, isnan and add.reduce in this case).

 Anyway, I guess this is still a good example to consider for how we should
go about getting to a new implementation, ideally with just a single-way to
override?

Indeed, how do we actually envisage deprecating the use of
`__array_function__` for a given part of the numpy API? Are we allowed to
go cold-turkey if the new implementation is covered by `__array_ufunc__`?

All the best,

Marten



On Thu, Jun 13, 2019 at 4:18 AM Ralf Gommers  wrote:

>
>
> On Thu, Jun 13, 2019 at 3:16 AM Stephan Hoyer  wrote:
>
>> On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk <
>> m.h.vankerkw...@gmail.com> wrote:
>>
>>> Hi Ralf,
>>>
>>> You're right, the problem is with the added keyword argument (which
>>> would appear also if we did not still have to support the old .sum method
>>> override but just dispatched to __array_ufunc__ with `np.add.reduce` -
>>> maybe worse given that it seems likely the reduce method has seen much less
>>> testing in  __array_ufunc__ implementations).
>>>
>>> Still, I do think the question stands: we implement a `nansum` for our
>>> ndarray class a certain way, and provide ways to override it (three now, in
>>> fact). Is it really reasonable to expect that we wait 4 versions for other
>>> packages to keep up with this, and thus get stuck with given internal
>>> implementations?
>>>
>>
> In general, I'd say yes. This is a problem of our own making. If an
> implementation uses np.sum on an argument that can be a subclass or duck
> array (i.e. we didn't coerce with asarray), then the code is calling out to
> outside of numpy. At that point we have effectively made something public.
> We can still change it, but only if we're sure that we don't break things
> in the process (i.e. we then insert a new asarray, something you're not
> happy about in general ).
>
> I don't get the "three ways to override nansum" by the way. There's only
> one I think: __array_function__. There may be more for the sum() method.
>
> Another way to say the same thing: if a subclass does everything right,
> and we decide to add a new keyword to some function in 1.17 without
> considering those subclasses, then turning around straight after and saying
> "oh those subclasses rely on our old API, it may be okay to either break
> them or send them a deprecation warning" (which is I think what you were
> advocating for - your 4 and 3 options) is not reasonable.
>
>
>>> Aside: note that the present version of the nanfunctions relies on
>>> turning the arguments into arrays and copying 0s into it - that suggests
>>> that currently they do not work for duck arrays like Dask.
>>>
>>
> There being an issue with matrix in the PR suggests there is an issue for
> subclasses at least?
>
>
>> Agreed. We could safely rewrite things to use np.asarray(), without any
>> need to worry about backends compatibility. From an API perspective,
>> nothing would change -- we already cast inputs into base numpy arrays
>> inside the _replace_nan() routine.
>>
>
> In that case yes, nansum can be rewritten. However, it's only because of
> the use of (as)array - if it were asanyarray that would already scupper the
> plan.
>
> Cheers,
> Ralf
>
>
>>
>>>
>>> All the best,
>>>
>>> Marten
>>>
>>> On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers 
>>> wrote:
>>>


 On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt <
 stef...@berkeley.edu> wrote:

> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
> > In a way, I brought it up mostly as a concrete example of an internal
> > implementation which we cannot change to an objectively cleaner one
> because
> > other packages rely on an out-of-date numpy API.
>

 I think this is not the right way to describe the problem (see below).


> This, and the comments Nathaniel made on the array function thread, are
> important to take note of.  Would it be worth updating NEP 18 with a
> list of pitfalls?  Or should this be a new informational NEP that
> 

Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-13 Thread Ralf Gommers
On Thu, Jun 13, 2019 at 3:16 AM Stephan Hoyer  wrote:

> On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>> Hi Ralf,
>>
>> You're right, the problem is with the added keyword argument (which would
>> appear also if we did not still have to support the old .sum method
>> override but just dispatched to __array_ufunc__ with `np.add.reduce` -
>> maybe worse given that it seems likely the reduce method has seen much less
>> testing in  __array_ufunc__ implementations).
>>
>> Still, I do think the question stands: we implement a `nansum` for our
>> ndarray class a certain way, and provide ways to override it (three now, in
>> fact). Is it really reasonable to expect that we wait 4 versions for other
>> packages to keep up with this, and thus get stuck with given internal
>> implementations?
>>
>
In general, I'd say yes. This is a problem of our own making. If an
implementation uses np.sum on an argument that can be a subclass or duck
array (i.e. we didn't coerce with asarray), then the code is calling out to
outside of numpy. At that point we have effectively made something public.
We can still change it, but only if we're sure that we don't break things
in the process (i.e. we then insert a new asarray, something you're not
happy about in general ).

I don't get the "three ways to override nansum" by the way. There's only
one I think: __array_function__. There may be more for the sum() method.

Another way to say the same thing: if a subclass does everything right, and
we decide to add a new keyword to some function in 1.17 without considering
those subclasses, then turning around straight after and saying "oh those
subclasses rely on our old API, it may be okay to either break them or send
them a deprecation warning" (which is I think what you were advocating for
- your 4 and 3 options) is not reasonable.


>> Aside: note that the present version of the nanfunctions relies on
>> turning the arguments into arrays and copying 0s into it - that suggests
>> that currently they do not work for duck arrays like Dask.
>>
>
There being an issue with matrix in the PR suggests there is an issue for
subclasses at least?


> Agreed. We could safely rewrite things to use np.asarray(), without any
> need to worry about backends compatibility. From an API perspective,
> nothing would change -- we already cast inputs into base numpy arrays
> inside the _replace_nan() routine.
>

In that case yes, nansum can be rewritten. However, it's only because of
the use of (as)array - if it were asanyarray that would already scupper the
plan.

Cheers,
Ralf


>
>>
>> All the best,
>>
>> Marten
>>
>> On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers 
>> wrote:
>>
>>>
>>>
>>> On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt <
>>> stef...@berkeley.edu> wrote:
>>>
 On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
 > In a way, I brought it up mostly as a concrete example of an internal
 > implementation which we cannot change to an objectively cleaner one
 because
 > other packages rely on an out-of-date numpy API.

>>>
>>> I think this is not the right way to describe the problem (see below).
>>>
>>>
 This, and the comments Nathaniel made on the array function thread, are
 important to take note of.  Would it be worth updating NEP 18 with a
 list of pitfalls?  Or should this be a new informational NEP that
 discusses—on a higher level—the benefits, risks, and design
 considerations of providing protocols?

>>>
>>> That would be a nice thing to do (the higher level one), but in this
>>> case I think the issue has little to do with NEP 18. The summary of the
>>> issue in this thread is a little brief, so let me try to clarify.
>>>
>>> 1. np.sum gained a new `where=` keyword in 1.17.0
>>> 2. using np.sum(x) will detect a `x.sum` method if it's present and try
>>> to use that
>>> 3. the `_wrapreduction` utility that forwards the function to the method
>>> will compare signatures of np.sum and x.sum, and throw an error if there's
>>> a mismatch for any keywords that have a value other than the default
>>> np._NoValue
>>>
>>> Code to check this:
>>> >>> x1 = np.arange(5)
>>> >>> x2 = np.asmatrix(x1)
>>> >>> np.sum(x1)  # works
>>> >>> np.sum(x2)  # works
>>> >>> np.sum(x1, where=x1>3)  # works
>>> >>> np.sum(x2, where=x2>3)  # _wrapreduction throws TypeError
>>> ...
>>> TypeError: sum() got an unexpected keyword argument 'where'
>>>
>>> Note that this is not specific to np.matrix. Using pandas.Series you
>>> also get a TypeError:
>>> >>> y = pd.Series(x1)
>>> >>> np.sum(y)  # works
>>> >>> np.sum(y, where=y>3)  # pandas throws TypeError
>>> ...
>>> TypeError: sum() got an unexpected keyword argument 'where'
>>>
>>> The issue is that when we have this kind of forwarding logic,
>>> irrespective of how it's implemented, new keywords cannot be used until the
>>> array-like objects with the methods that get forwarded to gain the same
>>> 

Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-12 Thread Stephan Hoyer
On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi Ralf,
>
> You're right, the problem is with the added keyword argument (which would
> appear also if we did not still have to support the old .sum method
> override but just dispatched to __array_ufunc__ with `np.add.reduce` -
> maybe worse given that it seems likely the reduce method has seen much less
> testing in  __array_ufunc__ implementations).
>
> Still, I do think the question stands: we implement a `nansum` for our
> ndarray class a certain way, and provide ways to override it (three now, in
> fact). Is it really reasonable to expect that we wait 4 versions for other
> packages to keep up with this, and thus get stuck with given internal
> implementations?
>
> Aside: note that the present version of the nanfunctions relies on turning
> the arguments into arrays and copying 0s into it - that suggests that
> currently they do not work for duck arrays like Dask.
>

Agreed. We could safely rewrite things to use np.asarray(), without any
need to worry about backends compatibility. From an API perspective,
nothing would change -- we already cast inputs into base numpy arrays
inside the _replace_nan() routine.


>
> All the best,
>
> Marten
>
> On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers 
> wrote:
>
>>
>>
>> On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt <
>> stef...@berkeley.edu> wrote:
>>
>>> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
>>> > In a way, I brought it up mostly as a concrete example of an internal
>>> > implementation which we cannot change to an objectively cleaner one
>>> because
>>> > other packages rely on an out-of-date numpy API.
>>>
>>
>> I think this is not the right way to describe the problem (see below).
>>
>>
>>> This, and the comments Nathaniel made on the array function thread, are
>>> important to take note of.  Would it be worth updating NEP 18 with a
>>> list of pitfalls?  Or should this be a new informational NEP that
>>> discusses—on a higher level—the benefits, risks, and design
>>> considerations of providing protocols?
>>>
>>
>> That would be a nice thing to do (the higher level one), but in this case
>> I think the issue has little to do with NEP 18. The summary of the issue in
>> this thread is a little brief, so let me try to clarify.
>>
>> 1. np.sum gained a new `where=` keyword in 1.17.0
>> 2. using np.sum(x) will detect a `x.sum` method if it's present and try
>> to use that
>> 3. the `_wrapreduction` utility that forwards the function to the method
>> will compare signatures of np.sum and x.sum, and throw an error if there's
>> a mismatch for any keywords that have a value other than the default
>> np._NoValue
>>
>> Code to check this:
>> >>> x1 = np.arange(5)
>> >>> x2 = np.asmatrix(x1)
>> >>> np.sum(x1)  # works
>> >>> np.sum(x2)  # works
>> >>> np.sum(x1, where=x1>3)  # works
>> >>> np.sum(x2, where=x2>3)  # _wrapreduction throws TypeError
>> ...
>> TypeError: sum() got an unexpected keyword argument 'where'
>>
>> Note that this is not specific to np.matrix. Using pandas.Series you also
>> get a TypeError:
>> >>> y = pd.Series(x1)
>> >>> np.sum(y)  # works
>> >>> np.sum(y, where=y>3)  # pandas throws TypeError
>> ...
>> TypeError: sum() got an unexpected keyword argument 'where'
>>
>> The issue is that when we have this kind of forwarding logic,
>> irrespective of how it's implemented, new keywords cannot be used until the
>> array-like objects with the methods that get forwarded to gain the same
>> keyword.
>>
>> tl;dr this is simply a cost we have to be aware of when either proposing
>> to add new keywords, or when proposing any kind of dispatching logic (in
>> this case `_wrapreduction`).
>>
>> Regarding internal use of  `np.sum(..., where=)`: this should not be done
>> until at least 4-5 versions from now, and preferably way longer than that.
>> Because doing so will break already released versions of Pandas, Dask, and
>> other libraries with array-like objects.
>>
>> Cheers,
>> Ralf
>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-12 Thread Marten van Kerkwijk
Hi Ralf,

You're right, the problem is with the added keyword argument (which would
appear also if we did not still have to support the old .sum method
override but just dispatched to __array_ufunc__ with `np.add.reduce` -
maybe worse given that it seems likely the reduce method has seen much less
testing in  __array_ufunc__ implementations).

Still, I do think the question stands: we implement a `nansum` for our
ndarray class a certain way, and provide ways to override it (three now, in
fact). Is it really reasonable to expect that we wait 4 versions for other
packages to keep up with this, and thus get stuck with given internal
implementations?

Aside: note that the present version of the nanfunctions relies on turning
the arguments into arrays and copying 0s into it - that suggests that
currently they do not work for duck arrays like Dask.

All the best,

Marten

On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers  wrote:

>
>
> On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt 
> wrote:
>
>> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
>> > In a way, I brought it up mostly as a concrete example of an internal
>> > implementation which we cannot change to an objectively cleaner one
>> because
>> > other packages rely on an out-of-date numpy API.
>>
>
> I think this is not the right way to describe the problem (see below).
>
>
>> This, and the comments Nathaniel made on the array function thread, are
>> important to take note of.  Would it be worth updating NEP 18 with a
>> list of pitfalls?  Or should this be a new informational NEP that
>> discusses—on a higher level—the benefits, risks, and design
>> considerations of providing protocols?
>>
>
> That would be a nice thing to do (the higher level one), but in this case
> I think the issue has little to do with NEP 18. The summary of the issue in
> this thread is a little brief, so let me try to clarify.
>
> 1. np.sum gained a new `where=` keyword in 1.17.0
> 2. using np.sum(x) will detect a `x.sum` method if it's present and try to
> use that
> 3. the `_wrapreduction` utility that forwards the function to the method
> will compare signatures of np.sum and x.sum, and throw an error if there's
> a mismatch for any keywords that have a value other than the default
> np._NoValue
>
> Code to check this:
> >>> x1 = np.arange(5)
> >>> x2 = np.asmatrix(x1)
> >>> np.sum(x1)  # works
> >>> np.sum(x2)  # works
> >>> np.sum(x1, where=x1>3)  # works
> >>> np.sum(x2, where=x2>3)  # _wrapreduction throws TypeError
> ...
> TypeError: sum() got an unexpected keyword argument 'where'
>
> Note that this is not specific to np.matrix. Using pandas.Series you also
> get a TypeError:
> >>> y = pd.Series(x1)
> >>> np.sum(y)  # works
> >>> np.sum(y, where=y>3)  # pandas throws TypeError
> ...
> TypeError: sum() got an unexpected keyword argument 'where'
>
> The issue is that when we have this kind of forwarding logic, irrespective
> of how it's implemented, new keywords cannot be used until the array-like
> objects with the methods that get forwarded to gain the same keyword.
>
> tl;dr this is simply a cost we have to be aware of when either proposing
> to add new keywords, or when proposing any kind of dispatching logic (in
> this case `_wrapreduction`).
>
> Regarding internal use of  `np.sum(..., where=)`: this should not be done
> until at least 4-5 versions from now, and preferably way longer than that.
> Because doing so will break already released versions of Pandas, Dask, and
> other libraries with array-like objects.
>
> Cheers,
> Ralf
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-12 Thread Ralf Gommers
On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt 
wrote:

> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
> > In a way, I brought it up mostly as a concrete example of an internal
> > implementation which we cannot change to an objectively cleaner one
> because
> > other packages rely on an out-of-date numpy API.
>

I think this is not the right way to describe the problem (see below).


> This, and the comments Nathaniel made on the array function thread, are
> important to take note of.  Would it be worth updating NEP 18 with a
> list of pitfalls?  Or should this be a new informational NEP that
> discusses—on a higher level—the benefits, risks, and design
> considerations of providing protocols?
>

That would be a nice thing to do (the higher level one), but in this case I
think the issue has little to do with NEP 18. The summary of the issue in
this thread is a little brief, so let me try to clarify.

1. np.sum gained a new `where=` keyword in 1.17.0
2. using np.sum(x) will detect a `x.sum` method if it's present and try to
use that
3. the `_wrapreduction` utility that forwards the function to the method
will compare signatures of np.sum and x.sum, and throw an error if there's
a mismatch for any keywords that have a value other than the default
np._NoValue

Code to check this:
>>> x1 = np.arange(5)
>>> x2 = np.asmatrix(x1)
>>> np.sum(x1)  # works
>>> np.sum(x2)  # works
>>> np.sum(x1, where=x1>3)  # works
>>> np.sum(x2, where=x2>3)  # _wrapreduction throws TypeError
...
TypeError: sum() got an unexpected keyword argument 'where'

Note that this is not specific to np.matrix. Using pandas.Series you also
get a TypeError:
>>> y = pd.Series(x1)
>>> np.sum(y)  # works
>>> np.sum(y, where=y>3)  # pandas throws TypeError
...
TypeError: sum() got an unexpected keyword argument 'where'

The issue is that when we have this kind of forwarding logic, irrespective
of how it's implemented, new keywords cannot be used until the array-like
objects with the methods that get forwarded to gain the same keyword.

tl;dr this is simply a cost we have to be aware of when either proposing to
add new keywords, or when proposing any kind of dispatching logic (in this
case `_wrapreduction`).

Regarding internal use of  `np.sum(..., where=)`: this should not be done
until at least 4-5 versions from now, and preferably way longer than that.
Because doing so will break already released versions of Pandas, Dask, and
other libraries with array-like objects.

Cheers,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-11 Thread Marten van Kerkwijk
> In a way, I brought it up mostly as a concrete example of an internal
> implementation which we cannot change to an objectively cleaner one because
> other packages rely on an out-of-date numpy API.
>
> Should have added: rely on an out-of-date numpy API where we have multiple
ways for packages to provide their own overrides. But I guess in this case
it is really matrix which is the problem. If so, maybe just adding kwargs
to it is something we should consider.

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-11 Thread Sebastian Berg
On Tue, 2019-06-11 at 10:56 +0200, Ralf Gommers wrote:
> 
> 
> On Mon, Jun 10, 2019 at 7:47 PM Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
> > Hi All,
> > 
> > In https://github.com/numpy/numpy/pull/12801, Tyler has been trying
> > to use the new `where` argument for reductions to implement
> > `nansum`, etc., using simplifications that boil down to
> > `np.sum(..., where=~isnan(...))`.
> > 
> > A problem that occurs is that `np.sum` will use a `.sum` method if
> > that is present, and for matrix, the `.sum` method does not take a
> > `where` argument. Since the `where` argument has been introduced
> > only recently, similar problems may happen for array mimics that
> > implement their own `.sum` method.
> > 
> > The question now is what to do, with options being:
> > 1. Let's stick with the existing implementation; the speed-up is
> > not that great anyway.
> > 2. Use try/except around the new implementation and use the old one
> > if it fails.
> > 3. As (2), but emit a deprecation warning. This will help array
> > mimics, but not matrix (unless someone makes a PR; would we even
> > accept it?);
> > 4. Use the new implementation. `matrix` should be gone anyway and
> > array mimics can either update their `.sum()` method or override
> > `np.nansum` with `__array_function__`.
> > 
> > Personally, I'd prefer (4), but realize that (3) is probably the
> > more safer approach, even if it is really annoying to still be
> > taking into account matrix deficiencies.
> > 
> 
> Honestly, I agree with Tyler's assessment when he closed the PR: 
> "there's not much evidence of remarkable performance improvement, and
> many of the other nan functions would have more complicated
> implementation details that are unlikely to make the relative
> performance much better. It doesn't seem to be a priority either."
> 
> The code also got more rather than less complex. So why do this? Yes
> you give a reason why it may help duck arrays, but it also breaks
> things apparently.

There could probably be reasons for this, since replacing with 0 may
not work always for object arrays. But that is rather hypothetical
(e.g. using add as string concatenation).

Calling `np.add.reduce` instead of the sum method could be slightly
more compatible, but could have its own set of issue.
But I suppose it is true, unless we plan on adding specialized `where`
loops or a nanadd ufunc itself the gain is likely so small that it it
arguably may make more sense to defer this until there is an an actual
benefit (and less pain due to adoption of the new protocols).

Best,

Sebastian

> 
> Re matrix: it's not deprecated yet and is still fairly widely used,
> so yes we'd accept a PR and no we should not break it gratuitously.
> 
> Cheers,
> Ralf
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-11 Thread Ralf Gommers
On Mon, Jun 10, 2019 at 7:47 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi All,
>
> In https://github.com/numpy/numpy/pull/12801, Tyler has been trying to
> use the new `where` argument for reductions to implement `nansum`, etc.,
> using simplifications that boil down to `np.sum(..., where=~isnan(...))`.
>
> A problem that occurs is that `np.sum` will use a `.sum` method if that is
> present, and for matrix, the `.sum` method does not take a `where`
> argument. Since the `where` argument has been introduced only recently,
> similar problems may happen for array mimics that implement their own
> `.sum` method.
>
> The question now is what to do, with options being:
> 1. Let's stick with the existing implementation; the speed-up is not that
> great anyway.
> 2. Use try/except around the new implementation and use the old one if it
> fails.
> 3. As (2), but emit a deprecation warning. This will help array mimics,
> but not matrix (unless someone makes a PR; would we even accept it?);
> 4. Use the new implementation. `matrix` should be gone anyway and array
> mimics can either update their `.sum()` method or override `np.nansum` with
> `__array_function__`.
>
> Personally, I'd prefer (4), but realize that (3) is probably the more
> safer approach, even if it is really annoying to still be taking into
> account matrix deficiencies.
>

Honestly, I agree with Tyler's assessment when he closed the PR:
"there's not much evidence of remarkable performance improvement, and many
of the other nan functions would have more complicated implementation
details that are unlikely to make the relative performance much better. It
doesn't seem to be a priority either."

The code also got more rather than less complex. So why do this? Yes you
give a reason why it may help duck arrays, but it also breaks things
apparently.

Re matrix: it's not deprecated yet and is still fairly widely used, so yes
we'd accept a PR and no we should not break it gratuitously.

Cheers,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-10 Thread Hameer Abbasi
On Mon, 2019-06-10 at 13:47 -0400, Marten van Kerkwijk wrote:
> Hi All,
> 
> In https://github.com/numpy/numpy/pull/12801, Tyler has been trying
> to use the new `where` argument for reductions to implement `nansum`,
> etc., using simplifications that boil down to `np.sum(...,
> where=~isnan(...))`.
> 
> A problem that occurs is that `np.sum` will use a `.sum` method if
> that is present, and for matrix, the `.sum` method does not take a
> `where` argument. Since the `where` argument has been introduced only
> recently, similar problems may happen for array mimics that implement
> their own `.sum` method.

Hi Marten! I ran into a similar issue with the initial kwarg when I
implemented it, except at that time I just used the np._NoValue.
> The question now is what to do, with options being:
> 1. Let's stick with the existing implementation; the speed-up is not
> that great anyway.
> 2. Use try/except around the new implementation and use the old one
> if it fails.
> 3. As (2), but emit a deprecation warning. This will help array
> mimics, but not matrix (unless someone makes a PR; would we even
> accept it?);
> 4. Use the new implementation. `matrix` should be gone anyway and
> array mimics can either update their `.sum()` method or override
> `np.nansum` with `__array_function__`.
> 
> Personally, I'd prefer (4), but realize that (3) is probably the more
> safer approach, even if it is really annoying to still be taking into
> account matrix deficiencies.

If nansum does any other kind of dispatch that should be kept around in
any case. Otherwise it failed then and it would fail now. We can catch
and raise the right type of exception for backwards compatibility if
needed.
> All the best,
> 
> Marten
> 
> p.s. One could also avoid the `.sum()` override altogether by doing
> `np.add.reduce(..., where=...)`, but this would probably break just
> as much.
> 
> 
> ___NumPy-Discussion
> mailing listnumpy-discuss...@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

2019-06-10 Thread Marten van Kerkwijk
Hi All,

In https://github.com/numpy/numpy/pull/12801, Tyler has been trying to use
the new `where` argument for reductions to implement `nansum`, etc., using
simplifications that boil down to `np.sum(..., where=~isnan(...))`.

A problem that occurs is that `np.sum` will use a `.sum` method if that is
present, and for matrix, the `.sum` method does not take a `where`
argument. Since the `where` argument has been introduced only recently,
similar problems may happen for array mimics that implement their own
`.sum` method.

The question now is what to do, with options being:
1. Let's stick with the existing implementation; the speed-up is not that
great anyway.
2. Use try/except around the new implementation and use the old one if it
fails.
3. As (2), but emit a deprecation warning. This will help array mimics, but
not matrix (unless someone makes a PR; would we even accept it?);
4. Use the new implementation. `matrix` should be gone anyway and array
mimics can either update their `.sum()` method or override `np.nansum` with
`__array_function__`.

Personally, I'd prefer (4), but realize that (3) is probably the more safer
approach, even if it is really annoying to still be taking into account
matrix deficiencies.

All the best,

Marten

p.s. One could also avoid the `.sum()` override altogether by doing
`np.add.reduce(..., where=...)`, but this would probably break just as much.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion