Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-29 Thread Sebastian Berg
On Wed, 2020-04-29 at 05:26 -0500, Juan Nunez-Iglesias wrote:
> Hi everyone, and thank you Ralf for carrying the flag in my absence.
> =D
> 
> Sebastian, the *primary* motivation behind avoiding detach() in
> PyTorch is listed in original post of the PyTorch issue:
> 
> > People not very familiar with `requires_grad` and cpu/gpu Tensors
> > might go back and forth with numpy. For example doing pytorch ->
> > numpy -> pytorch and backward on the last Tensor. This will
> > backward without issue but not all the way to the first part of the
> > code and won’t raise any error.
> 
> The PyTorch team are concerned that they will be overwhelmed with
> help requests if np.array() silently succeeds on a tensor with
> gradients. I definitely get that.

Sorry for playing advocatus diaboli...

I guess it is simply that before the end, it would be nice to have a
short list with projects:

* Napari, matplotlib on the "user" side
* PyTorch, ...? on the "provider" side

And maybe what their expectations on `force=True` are, to make sure
they roughly align.

The best definition for when to use `force=True` at this time seems to
be "end-point" users (such as visualization or maybe writing to disk?).

I still think performance can be just as valid of an issue there. For
example it may be better to convert to a numpy array earlier in the
computation.  Or someone could be surprised that saving their gpu array
to an hdf5 file is by far the slowest part of the computation.

Maybe I have the feeling the definition we want is actually:

   There is definitely no way to do this computation faster or better
   than by converting it to a NumPy array.

Since currently the main reason to reject it seems a bit to be:

   Wait, are you sure there is not a much better way than using NumPy
   arrays, be careful!

And while that distinction is clear for PyTorch + visualization, I am
not quite sure yet, that it is clear for various combinations of
`force=True` and array-like users.
Maybe CuPy does not want h5py to use `force=True`, because cupy has its
own super fast "stream-to-file" functionality... But it wants to to do
it for napari.

- Sebastian


> 
> Avoiding .gpu() is more straightforwardly about avoiding implicit
> expensive computation.
> 
> > while others do not choose to teach about it. There seems very
> > little
> > or even no "promise" attached to either `force=True` or
> > `force=False`.
> 
> NumPy can set a precedent through policy. The *only* reason client
> libraries would implement `__array__` is to play well with NumPy, so
> if NumPy documents that `force=True` should *always* succeed, we can
> expect client libraries to follow suit. At least the PyTorch devs
> have indicated that they would be open to this.
> 
> > E.g. Napari wants to use it, but do the array-providers want Napari
> > to use it?
> 
> As Ralf pointed out, the PyTorch devs have already agreed to it.
> 
> From the napari perspective, we'd be ok with leaving the decision on
> warnings to client libraries. We may or may not suppress them
> depending on user requests. ;) But the point is to have a way of
> saying "give me a NumPy array DAMMIT" without having to know about
> all the possible array libraries. Which are numerous and getting
> numerouser.
> 
> Ralf, you said you don't want warnings — even for sparse arrays? That
> was an area of concern for you on the PyTorch discussion side.
> 
> > And if the conversion still gives warnings for some array-objects,
> > have we actually gained much?
> 
> Yes.
> 
> Hameer,
> 
> > I would advocate for a `force=` kwarg but personally I don't think
> > it's explicit enough, but probably as explicit as can be given
> > NumPy's API.
> 
> Yeah, I agree that force is kind of vague, which is why I was looking
> for things like `allow_copy`. But it is hard to be general enough
> here: sparse requires an expensive instantiation, cupy requires
> copying from gpu to cpu, dask requires arbitrary computation, xarray
> requires information loss... I'm inclined to agree with Ralf that
> force= is the only generic-enough term, but I'm happy to entertain
> other options!
> 
> Juan.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-29 Thread Ralf Gommers
On Wed, Apr 29, 2020 at 12:27 PM Juan Nunez-Iglesias 
wrote:

>
>
> Ralf, you said you don't want warnings — even for sparse arrays? That was
> an area of concern for you on the PyTorch discussion side.
>

Providing a boolean keyword argument and then raising a warning every time
anyone uses that keyword makes very little sense.

This should simply be handled by clear documentation: use only in code like
visualization where the array arrives at its "end station", never in
functions that are again built on top of.

Cheers,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-29 Thread Juan Nunez-Iglesias
Hi everyone, and thank you Ralf for carrying the flag in my absence. =D

Sebastian, the *primary* motivation behind avoiding detach() in PyTorch is 
listed in original post of the PyTorch issue:

> People not very familiar with `requires_grad` and cpu/gpu Tensors might go 
> back and forth with numpy. For example doing pytorch -> numpy -> pytorch and 
> backward on the last Tensor. This will backward without issue but not all the 
> way to the first part of the code and won’t raise any error.


The PyTorch team are concerned that they will be overwhelmed with help requests 
if np.array() silently succeeds on a tensor with gradients. I definitely get 
that.

Avoiding .gpu() is more straightforwardly about avoiding implicit expensive 
computation.

> while others do not choose to teach about it. There seems very little
> or even no "promise" attached to either `force=True` or `force=False`.

NumPy can set a precedent through policy. The *only* reason client libraries 
would implement `__array__` is to play well with NumPy, so if NumPy documents 
that `force=True` should *always* succeed, we can expect client libraries to 
follow suit. At least the PyTorch devs have indicated that they would be open 
to this.

> E.g. Napari wants to use it, but do the array-providers want Napari to use it?

As Ralf pointed out, the PyTorch devs have already agreed to it.

>From the napari perspective, we'd be ok with leaving the decision on warnings 
>to client libraries. We may or may not suppress them depending on user 
>requests. ;) But the point is to have a way of saying "give me a NumPy array 
>DAMMIT" without having to know about all the possible array libraries. Which 
>are numerous and getting numerouser.

Ralf, you said you don't want warnings — even for sparse arrays? That was an 
area of concern for you on the PyTorch discussion side.

> And if the conversion still gives warnings for some array-objects, have we 
> actually gained much?

Yes.

Hameer,

> I would advocate for a `force=` kwarg but personally I don't think it's 
> explicit enough, but probably as explicit as can be given NumPy's API.

Yeah, I agree that force is kind of vague, which is why I was looking for 
things like `allow_copy`. But it is hard to be general enough here: sparse 
requires an expensive instantiation, cupy requires copying from gpu to cpu, 
dask requires arbitrary computation, xarray requires information loss... I'm 
inclined to agree with Ralf that force= is the only generic-enough term, but 
I'm happy to entertain other options!

Juan.___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-29 Thread Ralf Gommers
On Tue, Apr 28, 2020 at 5:03 PM Sebastian Berg 
wrote:

> On Tue, 2020-04-28 at 11:51 +0200, Ralf Gommers wrote:
> 
> > > So arguably, there is no type-safety concern due to `.detach()`.
> >
> > I'm not sure what the question is here; no one mentioned type-safety.
> > The
> > PyTorch maintainers have already said they're fine with adding a
> > force
> > keyword.
>
> But type-safety is the reason to distinguish between:
>
> * np.asarrau(tensor)
> * np.asarray(tensor, force=True)
>

No it's not, the rationale given by library authors is expensive conversion
/ memory copies / side effects. `np.asarray(x)` is used all over the place,
and can/will continue to be used by library authors. `force=True` is for
cases where things like expensive conversion don't matter, like
visualization - if you need a picture of an array then it helps, while the
downside of writing inefficient/unreliable numerical code isn't present.


> Similar to:
>
> * operator.index(obj)
> * int(obj)   # convert less type-safe (strings, floats)!
>
> I actually mentioned 3 reasons in my email:
>
> 1. Teach and Inform users (about the next two mainly)
> 2. Type-safety
> 3. Expensive conversion
>
> And only type-safety is related to `.detach()` mentioning that there
> may not be clear story about the usage in that case.
>
> (continued below)
>
> >
> 
> > >
> > >
> > > I am very much in favor of adding such things, but I still lack a
> > > bit
> > > of clarity as to whom we would be helping?
> > >
> >
> > See Juan's first email. I personally am ambivalent on this proposal,
> > but if
> > Juan and the Napari devs really want it, that's good enough for me.
>
> Of course I read it, twice, but it is only good enough for me if we
> actually *solve the issue*, and for that I want to know which issue we
> are solving :), it seems obvious, but I am not so sure...
>
> That brings us to the other two reasons:
>
> Teaching and Informing users:
>
> If Napari uses `force=True` indiscriminately, it is not very clear to
> the user about whether or not the operation is expensive.  I.e. the
> user can learn it is when using `np.asarray(sparse_arr)` with other
> libraries. But they are not notified that `napari.vis_func(sparse_arr)`
> might kill their computer.
>
> So the "Teaching" part can still partially work, but it does not inform
> the user well anymore on whether or not a function will blow-up memory.
>
> Expensive Conversion:
>
> If the main reason is expensive conversions, however, than, as a
> library I would probably just use it for half my API, since copying
> from GPU to CPU will still be much faster than my own function.
>
>
> Generally:
>
> I want to help Napari, but it seems like there may be more to this, and
> it may be good to finish these thoughts before making a call.
>
> E.g. Napari wants to use it, but do the array-providers want Napari to
> use it?
>
> For sparse Hameer just mentioned that he still would want big warnings
> both during the operation and in the `np.asarray` documentation.
> If we put such big warnings there, we should have an idea of who we
> want to ignore that warning? (Napari yes, sklearn sometimes, ...?)
>

There clearly should not be warnings. And sklearn is irrelevant, it cannot
use `force=True`.

Ralf



>-> Is "whatever the library feels right" good enough?
>
> And if the conversion still gives warnings for some array-objects, have
> we actually gained much?
>
>   -> Maybe we do, end-users may be happy to ignore those warnings...
>
>
> The one clear use-case for `force=True` is the end-user. Just like no
> library uses `int(obj)`, but end-users can use it very nicely.
> I am happy to help the end-user in this case, but if that is the target
> audience we may want to _discourage_ Napari from using `force=True` and
> encourage sparse not to put any RuntimeWarnings on it!
>
> - Sebastian
>
>
> > Cheers,
> > Ralf
> >
> >
> >
> > > If end-users will actually use `np.asarray(..., force=True)` over
> > > special methods, then great! But I am currently not sure the type-
> > > safety argument is all that big of a point.  And the performance or
> > > memory-blowup argument remains true even for visualization
> > > libraries
> > > (where the array is purely input and never output as such).
> > >
> > >
> > > But yes, "never copy" is a somewhat different extension to
> > > `__array__`
> > > and `np.asarray`. It guarantees high speed and in-place behaviour
> > > which
> > > is useful for different settings.
> > >
> > > - Sebastian
> > >
> > >
> > > > > Cheers,
> > > > > Ralf
> > > > >
> > > > >
> > > > > > I think the discussion stalled on the precise spelling of the
> > > > > > third
> > > > > > option.
> > > > > >
> > > > > > `__array__` was not discussed there, but it seems like adding
> > > > > > the
> > > > > > `copy`
> > > > > > argument to `__array__` would be a perfectly reasonable
> > > > > > extension.
> > > > > >
> > > > > > Eric
> > > > > >
> > > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias <
> > > > > 

Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-28 Thread Sebastian Berg
On Tue, 2020-04-28 at 09:58 -0500, Sebastian Berg wrote:
> On Tue, 2020-04-28 at 11:51 +0200, Ralf Gommers wrote:
> 
> > > So arguably, there is no type-safety concern due to `.detach()`.
> > 
> > I'm not sure what the question is here; no one mentioned type-
> > safety. 
> > The
> > PyTorch maintainers have already said they're fine with adding a
> > force
> > keyword.
> 
> But type-safety is the reason to distinguish between:
> 
> * np.asarrau(tensor)
> * np.asarray(tensor, force=True)
> 
> Similar to:
> 
> * operator.index(obj)
> * int(obj)   # convert less type-safe (strings, floats)!
> 
> I actually mentioned 3 reasons in my email:
> 
> 1. Teach and Inform users (about the next two mainly)
> 2. Type-safety
> 3. Expensive conversion 
> 
> And only type-safety is related to `.detach()` mentioning that there
> may not be clear story about the usage in that case.
> 

(Sorry something got broken here)

The question is what PyTorch's reasons are to feel `np.asarray(tensor)`
should not work generally.
I for one thought it was type-safety with regard to `.detach()`. And
then I was surprised to realize that type-safety might not be a great
reason to reject an implicit `.detach()` within `np.asarray(tensor)`.


In any case, all the long talk is simply that I first want to be clear
on what the concerns are why libraries reject `np.asarray(tensor)`.
And then, I want to be clear that adding `force=True` will actually
solves those concerns.
And I was surprised myself that this became very much unclear to me.

Again, one reason for it being not clear to me is half the ecosystem
could potentially can just always use `force=True`.  So there must be
some "good usage" and some "bad usage" and I would like to know what
that is.

- Sebastian


> (continued below)
> 
> 
> > > 
> > > I am very much in favor of adding such things, but I still lack a
> > > bit
> > > of clarity as to whom we would be helping?
> > > 
> > 
> > See Juan's first email. I personally am ambivalent on this
> > proposal,
> > but if
> > Juan and the Napari devs really want it, that's good enough for me.
> 
> Of course I read it, twice, but it is only good enough for me if we
> actually *solve the issue*, and for that I want to know which issue
> we
> are solving :), it seems obvious, but I am not so sure...
> 
> That brings us to the other two reasons:
> 
> Teaching and Informing users:
> 
> If Napari uses `force=True` indiscriminately, it is not very clear to
> the user about whether or not the operation is expensive.  I.e. the
> user can learn it is when using `np.asarray(sparse_arr)` with other
> libraries. But they are not notified that
> `napari.vis_func(sparse_arr)`
> might kill their computer.
> 
> So the "Teaching" part can still partially work, but it does not
> inform
> the user well anymore on whether or not a function will blow-up
> memory.
> 
> Expensive Conversion:
> 
> If the main reason is expensive conversions, however, than, as a
> library I would probably just use it for half my API, since copying
> from GPU to CPU will still be much faster than my own function.
> 
> 
> Generally:
> 
> I want to help Napari, but it seems like there may be more to this,
> and
> it may be good to finish these thoughts before making a call.
> 
> E.g. Napari wants to use it, but do the array-providers want Napari
> to
> use it?
> 
> For sparse Hameer just mentioned that he still would want big
> warnings
> both during the operation and in the `np.asarray` documentation.
> If we put such big warnings there, we should have an idea of who we
> want to ignore that warning? (Napari yes, sklearn sometimes, ...?)
> 
>-> Is "whatever the library feels right" good enough?
> 
> And if the conversion still gives warnings for some array-objects,
> have
> we actually gained much?
> 
>   -> Maybe we do, end-users may be happy to ignore those warnings...
> 
> 
> The one clear use-case for `force=True` is the end-user. Just like no
> library uses `int(obj)`, but end-users can use it very nicely.
> I am happy to help the end-user in this case, but if that is the
> target
> audience we may want to _discourage_ Napari from using `force=True`
> and
> encourage sparse not to put any RuntimeWarnings on it!
> 
> - Sebastian
> 
> 
> > Cheers,
> > Ralf
> > 
> > 
> > 
> > > If end-users will actually use `np.asarray(..., force=True)` over
> > > special methods, then great! But I am currently not sure the
> > > type-
> > > safety argument is all that big of a point.  And the performance
> > > or
> > > memory-blowup argument remains true even for visualization
> > > libraries
> > > (where the array is purely input and never output as such).
> > > 
> > > 
> > > But yes, "never copy" is a somewhat different extension to
> > > `__array__`
> > > and `np.asarray`. It guarantees high speed and in-place behaviour
> > > which
> > > is useful for different settings.
> > > 
> > > - Sebastian
> > > 
> > > 
> > > > > Cheers,
> > > > > Ralf
> > > > > 
> > > > > 
> > > > > > I think the 

Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-28 Thread Sebastian Berg
On Tue, 2020-04-28 at 11:51 +0200, Ralf Gommers wrote:

> > So arguably, there is no type-safety concern due to `.detach()`.
> 
> I'm not sure what the question is here; no one mentioned type-safety. 
> The
> PyTorch maintainers have already said they're fine with adding a
> force
> keyword.

But type-safety is the reason to distinguish between:

* np.asarrau(tensor)
* np.asarray(tensor, force=True)

Similar to:

* operator.index(obj)
* int(obj)   # convert less type-safe (strings, floats)!

I actually mentioned 3 reasons in my email:

1. Teach and Inform users (about the next two mainly)
2. Type-safety
3. Expensive conversion 

And only type-safety is related to `.detach()` mentioning that there
may not be clear story about the usage in that case.

(continued below)

> 

> > 
> > 
> > I am very much in favor of adding such things, but I still lack a
> > bit
> > of clarity as to whom we would be helping?
> > 
> 
> See Juan's first email. I personally am ambivalent on this proposal,
> but if
> Juan and the Napari devs really want it, that's good enough for me.

Of course I read it, twice, but it is only good enough for me if we
actually *solve the issue*, and for that I want to know which issue we
are solving :), it seems obvious, but I am not so sure...

That brings us to the other two reasons:

Teaching and Informing users:

If Napari uses `force=True` indiscriminately, it is not very clear to
the user about whether or not the operation is expensive.  I.e. the
user can learn it is when using `np.asarray(sparse_arr)` with other
libraries. But they are not notified that `napari.vis_func(sparse_arr)`
might kill their computer.

So the "Teaching" part can still partially work, but it does not inform
the user well anymore on whether or not a function will blow-up memory.

Expensive Conversion:

If the main reason is expensive conversions, however, than, as a
library I would probably just use it for half my API, since copying
from GPU to CPU will still be much faster than my own function.


Generally:

I want to help Napari, but it seems like there may be more to this, and
it may be good to finish these thoughts before making a call.

E.g. Napari wants to use it, but do the array-providers want Napari to
use it?

For sparse Hameer just mentioned that he still would want big warnings
both during the operation and in the `np.asarray` documentation.
If we put such big warnings there, we should have an idea of who we
want to ignore that warning? (Napari yes, sklearn sometimes, ...?)

   -> Is "whatever the library feels right" good enough?

And if the conversion still gives warnings for some array-objects, have
we actually gained much?

  -> Maybe we do, end-users may be happy to ignore those warnings...


The one clear use-case for `force=True` is the end-user. Just like no
library uses `int(obj)`, but end-users can use it very nicely.
I am happy to help the end-user in this case, but if that is the target
audience we may want to _discourage_ Napari from using `force=True` and
encourage sparse not to put any RuntimeWarnings on it!

- Sebastian


> Cheers,
> Ralf
> 
> 
> 
> > If end-users will actually use `np.asarray(..., force=True)` over
> > special methods, then great! But I am currently not sure the type-
> > safety argument is all that big of a point.  And the performance or
> > memory-blowup argument remains true even for visualization
> > libraries
> > (where the array is purely input and never output as such).
> > 
> > 
> > But yes, "never copy" is a somewhat different extension to
> > `__array__`
> > and `np.asarray`. It guarantees high speed and in-place behaviour
> > which
> > is useful for different settings.
> > 
> > - Sebastian
> > 
> > 
> > > > Cheers,
> > > > Ralf
> > > > 
> > > > 
> > > > > I think the discussion stalled on the precise spelling of the
> > > > > third
> > > > > option.
> > > > > 
> > > > > `__array__` was not discussed there, but it seems like adding
> > > > > the
> > > > > `copy`
> > > > > argument to `__array__` would be a perfectly reasonable
> > > > > extension.
> > > > > 
> > > > > Eric
> > > > > 
> > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias <
> > > > > j...@fastmail.com>
> > > > > wrote:
> > > > > 
> > > > > > Hi everyone,
> > > > > > 
> > > > > > One bit of expressivity we would miss is “copy if
> > > > > > necessary,
> > > > > > but
> > > > > > > otherwise don’t bother”, but there are workarounds to
> > > > > > > this.
> > > > > > > 
> > > > > > 
> > > > > > After a side discussion with Stéfan van der Walt, we came
> > > > > > up
> > > > > > with
> > > > > > `allow_copy=True`, which would express to the downstream
> > > > > > library that we
> > > > > > don’t mind waiting, but that zero-copy would also be ok.
> > > > > > 
> > > > > > This sounds like the sort of thing that is use case driven.
> > > > > > If
> > > > > > enough
> > > > > > projects want to use it, then I have no objections to
> > > > > > adding
> > > > > > the keyword.
> > > > > > OTOH, we 

Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-28 Thread Hameer Abbasi
Hi! Yes, I would advocate for a `force=` kwarg but personally I don't think 
it's explicit enough, but probably as explicit as can be given NumPy's API.

Personally, I'd also raise a warning within PyData/Sparse and I hope it's in 
big bold letters in the docs in NumPy to be careful with this.

Get Outlook for iOS<https://aka.ms/o0ukef>

From: NumPy-Discussion 
 on behalf of 
Ralf Gommers 
Sent: Tuesday, April 28, 2020 11:51:13 AM
To: Discussion of Numerical Python 
Subject: Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to 
`__array__` interface



On Mon, Apr 27, 2020 at 12:10 AM Sebastian Berg 
mailto:sebast...@sipsolutions.net>> wrote:
On Sat, 2020-04-25 at 10:52 -0700, Stephan Hoyer wrote:
> On Sat, Apr 25, 2020 at 10:40 AM Ralf Gommers 
> mailto:ralf.gomm...@gmail.com>
> >
> wrote:
>
> >
> > On Fri, Apr 24, 2020 at 12:35 PM Eric Wieser <
> > wieser.eric+nu...@gmail.com<mailto:wieser.eric%2bnu...@gmail.com>>
> > wrote:
> >
> > > Perhaps worth mentioning that we've discussed this sort of API
> > > before, in
> > > https://github.com/numpy/numpy/pull/11897.
> > >
> > > Under that proposal, the api would be something like:
> > >
> > > * `copy=True` - always copy, like it is today
> > > * `copy=False` - copy if needed, like it is today
> > > * `copy=np.never_copy` - never copy, throw an exception if not
> > > possible
> > >
> >
> > There's a couple of issues I see with using `copy` for __array__:
> > - copy is already weird (False doesn't mean no), and a [bool,
> > some_obj_or_str] keyword isn't making that better
> > - the behavior we're talking about can do more than copying, e.g.
> > for
> > PyTorch it would modify the autograd graph by adding detach(), and
> > for
> > sparse it's not just "make a copy" (which implies doubling memory
> > use) but
> > it densifies which can massively blow up the memory.
> > - I'm -1 on adding things to the main namespace (never_copy) for
> > something
> > that can be handled differently (like a string, or a new keyword)
> >
> > tl;dr a new `force` keyword would be better
> >
>
> I agree, “copy” is not a good description of this desired coercion
> behavior.
>
> A new keyword argument like “force” would be much clearer.
>

That seems fine and practical. But, in the end it seems to me that the
`force=` keyword, just means that some projects want to teach their
users that:

1. `np.asarray()` can be expensive (and may always copy)
2. `np.asarray()` always loses type properties

while others do not choose to teach about it.  There seems very little
or even no "promise" attached to either `force=True` or `force=False`.


In the end, the question is whether sparse will actually want to
implement `force=True` if the main reason we add is for library use.

That's for PyData Sparse and scipy.sparse devs to answer. Maybe Hameer can 
answer for the former here. For SciPy that should be decided on the scipy-dev 
list, but my opinion would be: yes to adding __array__ that raises TypeError by 
default, and converts with `force=True`.

There is no difference between a visualization library or numpy. In
both cases the users memory will blow up just the same.

As for PyTorch, is `.detach()` even a good reason?  Maybe I am missing
things, but:

>>> torch.ones(10, requires_grad=True) + np.arange(10)
# RuntimeError: Can't call numpy() on Variable that requires grad. Use
var.detach().numpy() instead.

So arguably, there is no type-safety concern due to `.detach()`.

I'm not sure what the question is here; no one mentioned type-safety. The 
PyTorch maintainers have already said they're fine with adding a force keyword.

There
is an (obvious) general loss of type information that always occurs
with an `np.asarray` call.
But I do not see that creating any openings for bugs here, due to the
wisdom of not allowing the above operation.
In fact, it actually seems much worse for for xarray, or pandas. They
do support the above operation and will potentially mess up if the
arange was previously an xarray with a matching index, but different
order.


I am very much in favor of adding such things, but I still lack a bit
of clarity as to whom we would be helping?

See Juan's first email. I personally am ambivalent on this proposal, but if 
Juan and the Napari devs really want it, that's good enough for me.

Cheers,
Ralf



If end-users will actually use `np.asarray(..., force=True)` over
special methods, then great! But I am currently not sure the type-
safety argument is all that big of a point.  And the performance or
memory-blowup argument remains true even for visualization libraries
(where the array is purely input and neve

Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-28 Thread Ralf Gommers
On Mon, Apr 27, 2020 at 12:10 AM Sebastian Berg 
wrote:

> On Sat, 2020-04-25 at 10:52 -0700, Stephan Hoyer wrote:
> > On Sat, Apr 25, 2020 at 10:40 AM Ralf Gommers  > >
> > wrote:
> >
> > >
> > > On Fri, Apr 24, 2020 at 12:35 PM Eric Wieser <
> > > wieser.eric+nu...@gmail.com>
> > > wrote:
> > >
> > > > Perhaps worth mentioning that we've discussed this sort of API
> > > > before, in
> > > > https://github.com/numpy/numpy/pull/11897.
> > > >
> > > > Under that proposal, the api would be something like:
> > > >
> > > > * `copy=True` - always copy, like it is today
> > > > * `copy=False` - copy if needed, like it is today
> > > > * `copy=np.never_copy` - never copy, throw an exception if not
> > > > possible
> > > >
> > >
> > > There's a couple of issues I see with using `copy` for __array__:
> > > - copy is already weird (False doesn't mean no), and a [bool,
> > > some_obj_or_str] keyword isn't making that better
> > > - the behavior we're talking about can do more than copying, e.g.
> > > for
> > > PyTorch it would modify the autograd graph by adding detach(), and
> > > for
> > > sparse it's not just "make a copy" (which implies doubling memory
> > > use) but
> > > it densifies which can massively blow up the memory.
> > > - I'm -1 on adding things to the main namespace (never_copy) for
> > > something
> > > that can be handled differently (like a string, or a new keyword)
> > >
> > > tl;dr a new `force` keyword would be better
> > >
> >
> > I agree, “copy” is not a good description of this desired coercion
> > behavior.
> >
> > A new keyword argument like “force” would be much clearer.
> >
>
> That seems fine and practical. But, in the end it seems to me that the
> `force=` keyword, just means that some projects want to teach their
> users that:
>
> 1. `np.asarray()` can be expensive (and may always copy)
> 2. `np.asarray()` always loses type properties
>
> while others do not choose to teach about it.  There seems very little
> or even no "promise" attached to either `force=True` or `force=False`.
>
>
> In the end, the question is whether sparse will actually want to
> implement `force=True` if the main reason we add is for library use.
>

That's for PyData Sparse and scipy.sparse devs to answer. Maybe Hameer can
answer for the former here. For SciPy that should be decided on the
scipy-dev list, but my opinion would be: yes to adding __array__ that
raises TypeError by default, and converts with `force=True`.


> There is no difference between a visualization library or numpy. In
> both cases the users memory will blow up just the same.
>
> As for PyTorch, is `.detach()` even a good reason?  Maybe I am missing
> things, but:
>
> >>> torch.ones(10, requires_grad=True) + np.arange(10)
> # RuntimeError: Can't call numpy() on Variable that requires grad. Use
> var.detach().numpy() instead.
>
> So arguably, there is no type-safety concern due to `.detach()`.


I'm not sure what the question is here; no one mentioned type-safety. The
PyTorch maintainers have already said they're fine with adding a force
keyword.

There
> is an (obvious) general loss of type information that always occurs
> with an `np.asarray` call.
> But I do not see that creating any openings for bugs here, due to the
> wisdom of not allowing the above operation.
> In fact, it actually seems much worse for for xarray, or pandas. They
> do support the above operation and will potentially mess up if the
> arange was previously an xarray with a matching index, but different
> order.
>
>
> I am very much in favor of adding such things, but I still lack a bit
> of clarity as to whom we would be helping?
>

See Juan's first email. I personally am ambivalent on this proposal, but if
Juan and the Napari devs really want it, that's good enough for me.

Cheers,
Ralf



> If end-users will actually use `np.asarray(..., force=True)` over
> special methods, then great! But I am currently not sure the type-
> safety argument is all that big of a point.  And the performance or
> memory-blowup argument remains true even for visualization libraries
> (where the array is purely input and never output as such).
>
>
> But yes, "never copy" is a somewhat different extension to `__array__`
> and `np.asarray`. It guarantees high speed and in-place behaviour which
> is useful for different settings.
>
> - Sebastian
>
>
> >
> > > Cheers,
> > > Ralf
> > >
> > >
> > > > I think the discussion stalled on the precise spelling of the
> > > > third
> > > > option.
> > > >
> > > > `__array__` was not discussed there, but it seems like adding the
> > > > `copy`
> > > > argument to `__array__` would be a perfectly reasonable
> > > > extension.
> > > >
> > > > Eric
> > > >
> > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias <
> > > > j...@fastmail.com>
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > One bit of expressivity we would miss is “copy if necessary,
> > > > > but
> > > > > > otherwise don’t bother”, but there are workarounds to this.
> > > > 

Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-26 Thread Sebastian Berg
On Sat, 2020-04-25 at 10:52 -0700, Stephan Hoyer wrote:
> On Sat, Apr 25, 2020 at 10:40 AM Ralf Gommers  >
> wrote:
> 
> > 
> > On Fri, Apr 24, 2020 at 12:35 PM Eric Wieser <
> > wieser.eric+nu...@gmail.com>
> > wrote:
> > 
> > > Perhaps worth mentioning that we've discussed this sort of API
> > > before, in
> > > https://github.com/numpy/numpy/pull/11897.
> > > 
> > > Under that proposal, the api would be something like:
> > > 
> > > * `copy=True` - always copy, like it is today
> > > * `copy=False` - copy if needed, like it is today
> > > * `copy=np.never_copy` - never copy, throw an exception if not
> > > possible
> > > 
> > 
> > There's a couple of issues I see with using `copy` for __array__:
> > - copy is already weird (False doesn't mean no), and a [bool,
> > some_obj_or_str] keyword isn't making that better
> > - the behavior we're talking about can do more than copying, e.g.
> > for
> > PyTorch it would modify the autograd graph by adding detach(), and
> > for
> > sparse it's not just "make a copy" (which implies doubling memory
> > use) but
> > it densifies which can massively blow up the memory.
> > - I'm -1 on adding things to the main namespace (never_copy) for
> > something
> > that can be handled differently (like a string, or a new keyword)
> > 
> > tl;dr a new `force` keyword would be better
> > 
> 
> I agree, “copy” is not a good description of this desired coercion
> behavior.
> 
> A new keyword argument like “force” would be much clearer.
> 

That seems fine and practical. But, in the end it seems to me that the
`force=` keyword, just means that some projects want to teach their
users that:

1. `np.asarray()` can be expensive (and may always copy)
2. `np.asarray()` always loses type properties

while others do not choose to teach about it.  There seems very little
or even no "promise" attached to either `force=True` or `force=False`.


In the end, the question is whether sparse will actually want to
implement `force=True` if the main reason we add is for library use.
There is no difference between a visualization library or numpy. In
both cases the users memory will blow up just the same.

As for PyTorch, is `.detach()` even a good reason?  Maybe I am missing
things, but:

>>> torch.ones(10, requires_grad=True) + np.arange(10)
# RuntimeError: Can't call numpy() on Variable that requires grad. Use
var.detach().numpy() instead.

So arguably, there is no type-safety concern due to `.detach()`. There
is an (obvious) general loss of type information that always occurs
with an `np.asarray` call.
But I do not see that creating any openings for bugs here, due to the
wisdom of not allowing the above operation.
In fact, it actually seems much worse for for xarray, or pandas. They
do support the above operation and will potentially mess up if the
arange was previously an xarray with a matching index, but different
order.


I am very much in favor of adding such things, but I still lack a bit
of clarity as to whom we would be helping?

If end-users will actually use `np.asarray(..., force=True)` over
special methods, then great! But I am currently not sure the type-
safety argument is all that big of a point.  And the performance or
memory-blowup argument remains true even for visualization libraries
(where the array is purely input and never output as such).


But yes, "never copy" is a somewhat different extension to `__array__`
and `np.asarray`. It guarantees high speed and in-place behaviour which
is useful for different settings.

- Sebastian


> 
> > Cheers,
> > Ralf
> > 
> > 
> > > I think the discussion stalled on the precise spelling of the
> > > third
> > > option.
> > > 
> > > `__array__` was not discussed there, but it seems like adding the
> > > `copy`
> > > argument to `__array__` would be a perfectly reasonable
> > > extension.
> > > 
> > > Eric
> > > 
> > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias <
> > > j...@fastmail.com>
> > > wrote:
> > > 
> > > > Hi everyone,
> > > > 
> > > > One bit of expressivity we would miss is “copy if necessary,
> > > > but
> > > > > otherwise don’t bother”, but there are workarounds to this.
> > > > > 
> > > > 
> > > > After a side discussion with Stéfan van der Walt, we came up
> > > > with
> > > > `allow_copy=True`, which would express to the downstream
> > > > library that we
> > > > don’t mind waiting, but that zero-copy would also be ok.
> > > > 
> > > > This sounds like the sort of thing that is use case driven. If
> > > > enough
> > > > projects want to use it, then I have no objections to adding
> > > > the keyword.
> > > > OTOH, we need to be careful about adding too many
> > > > interoperability tricks
> > > > as they complicate the code and makes it hard for folks to
> > > > determine the
> > > > best solution. Interoperability is a hot topic and we need to
> > > > be careful
> > > > not put too leave behind too many experiments in the NumPy
> > > > code.  Do you
> > > > have any other ideas of how to achieve the same effect?
> > > 

Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-25 Thread Stephan Hoyer
On Sat, Apr 25, 2020 at 10:40 AM Ralf Gommers 
wrote:

>
>
> On Fri, Apr 24, 2020 at 12:35 PM Eric Wieser 
> wrote:
>
>> Perhaps worth mentioning that we've discussed this sort of API before, in
>> https://github.com/numpy/numpy/pull/11897.
>>
>> Under that proposal, the api would be something like:
>>
>> * `copy=True` - always copy, like it is today
>> * `copy=False` - copy if needed, like it is today
>> * `copy=np.never_copy` - never copy, throw an exception if not possible
>>
>
> There's a couple of issues I see with using `copy` for __array__:
> - copy is already weird (False doesn't mean no), and a [bool,
> some_obj_or_str] keyword isn't making that better
> - the behavior we're talking about can do more than copying, e.g. for
> PyTorch it would modify the autograd graph by adding detach(), and for
> sparse it's not just "make a copy" (which implies doubling memory use) but
> it densifies which can massively blow up the memory.
> - I'm -1 on adding things to the main namespace (never_copy) for something
> that can be handled differently (like a string, or a new keyword)
>
> tl;dr a new `force` keyword would be better
>

I agree, “copy” is not a good description of this desired coercion behavior.

A new keyword argument like “force” would be much clearer.


> Cheers,
> Ralf
>
>
>> I think the discussion stalled on the precise spelling of the third
>> option.
>>
>> `__array__` was not discussed there, but it seems like adding the `copy`
>> argument to `__array__` would be a perfectly reasonable extension.
>>
>> Eric
>>
>> On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> One bit of expressivity we would miss is “copy if necessary, but
 otherwise don’t bother”, but there are workarounds to this.

>>>
>>> After a side discussion with Stéfan van der Walt, we came up with
>>> `allow_copy=True`, which would express to the downstream library that we
>>> don’t mind waiting, but that zero-copy would also be ok.
>>>
>>> This sounds like the sort of thing that is use case driven. If enough
>>> projects want to use it, then I have no objections to adding the keyword.
>>> OTOH, we need to be careful about adding too many interoperability tricks
>>> as they complicate the code and makes it hard for folks to determine the
>>> best solution. Interoperability is a hot topic and we need to be careful
>>> not put too leave behind too many experiments in the NumPy code.  Do you
>>> have any other ideas of how to achieve the same effect?
>>>
>>>
>>> Personally, I don’t have any other ideas, but would be happy to hear
>>> some!
>>>
>>> My view regarding API/experiment creep is that `__array__` is the oldest
>>> and most basic of all the interop tricks and that this can be safely
>>> maintained for future generations. Currently it only takes `dtype=` as a
>>> keyword argument, so it is a very lean API. I think this particular use
>>> case is very natural and I’ve encountered the reluctance to implicitly copy
>>> twice, so I expect it is reasonably common.
>>>
>>> Regarding difficulty in determining the best solution, I would be happy
>>> to contribute to the dispatch basics guide together with the new kwarg. I
>>> agree that the protocols are getting quite numerous and I couldn’t find a
>>> single place that gathers all the best practices together. But, to
>>> reiterate my point: `__array__` is the simplest of these and I think this
>>> keyword is pretty safe to add.
>>>
>>> For ease of discussion, here are the API options discussed so far, as
>>> well as a few extra that I don’t like but might trigger other ideas:
>>>
>>> np.asarray(my_duck_array, allow_copy=True)  # default is False, or None
>>> -> leave it to the duck array to decide
>>> np.asarray(my_duck_array, copy=True)  # always copies, but, if supported
>>> by the duck array, defers to it for the copy
>>> np.asarray(my_duck_array, copy=‘allow’)  # could take values ‘allow’,
>>> ‘force’, ’no’, True(=‘force’), False(=’no’)
>>> np.asarray(my_duck_array, force_copy=False, allow_copy=True)  # separate
>>> concepts, but unclear what force_copy=True, allow_copy=False means!
>>> np.asarray(my_duck_array, force=True)
>>>
>>> Juan.
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-25 Thread Ralf Gommers
On Fri, Apr 24, 2020 at 12:35 PM Eric Wieser 
wrote:

> Perhaps worth mentioning that we've discussed this sort of API before, in
> https://github.com/numpy/numpy/pull/11897.
>
> Under that proposal, the api would be something like:
>
> * `copy=True` - always copy, like it is today
> * `copy=False` - copy if needed, like it is today
> * `copy=np.never_copy` - never copy, throw an exception if not possible
>

There's a couple of issues I see with using `copy` for __array__:
- copy is already weird (False doesn't mean no), and a [bool,
some_obj_or_str] keyword isn't making that better
- the behavior we're talking about can do more than copying, e.g. for
PyTorch it would modify the autograd graph by adding detach(), and for
sparse it's not just "make a copy" (which implies doubling memory use) but
it densifies which can massively blow up the memory.
- I'm -1 on adding things to the main namespace (never_copy) for something
that can be handled differently (like a string, or a new keyword)

tl;dr a new `force` keyword would be better

Cheers,
Ralf


> I think the discussion stalled on the precise spelling of the third option.
>
> `__array__` was not discussed there, but it seems like adding the `copy`
> argument to `__array__` would be a perfectly reasonable extension.
>
> Eric
>
> On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias 
> wrote:
>
>> Hi everyone,
>>
>> One bit of expressivity we would miss is “copy if necessary, but
>>> otherwise don’t bother”, but there are workarounds to this.
>>>
>>
>> After a side discussion with Stéfan van der Walt, we came up with
>> `allow_copy=True`, which would express to the downstream library that we
>> don’t mind waiting, but that zero-copy would also be ok.
>>
>> This sounds like the sort of thing that is use case driven. If enough
>> projects want to use it, then I have no objections to adding the keyword.
>> OTOH, we need to be careful about adding too many interoperability tricks
>> as they complicate the code and makes it hard for folks to determine the
>> best solution. Interoperability is a hot topic and we need to be careful
>> not put too leave behind too many experiments in the NumPy code.  Do you
>> have any other ideas of how to achieve the same effect?
>>
>>
>> Personally, I don’t have any other ideas, but would be happy to hear some!
>>
>> My view regarding API/experiment creep is that `__array__` is the oldest
>> and most basic of all the interop tricks and that this can be safely
>> maintained for future generations. Currently it only takes `dtype=` as a
>> keyword argument, so it is a very lean API. I think this particular use
>> case is very natural and I’ve encountered the reluctance to implicitly copy
>> twice, so I expect it is reasonably common.
>>
>> Regarding difficulty in determining the best solution, I would be happy
>> to contribute to the dispatch basics guide together with the new kwarg. I
>> agree that the protocols are getting quite numerous and I couldn’t find a
>> single place that gathers all the best practices together. But, to
>> reiterate my point: `__array__` is the simplest of these and I think this
>> keyword is pretty safe to add.
>>
>> For ease of discussion, here are the API options discussed so far, as
>> well as a few extra that I don’t like but might trigger other ideas:
>>
>> np.asarray(my_duck_array, allow_copy=True)  # default is False, or None
>> -> leave it to the duck array to decide
>> np.asarray(my_duck_array, copy=True)  # always copies, but, if supported
>> by the duck array, defers to it for the copy
>> np.asarray(my_duck_array, copy=‘allow’)  # could take values ‘allow’,
>> ‘force’, ’no’, True(=‘force’), False(=’no’)
>> np.asarray(my_duck_array, force_copy=False, allow_copy=True)  # separate
>> concepts, but unclear what force_copy=True, allow_copy=False means!
>> np.asarray(my_duck_array, force=True)
>>
>> Juan.
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-24 Thread Sebastian Berg
On Fri, 2020-04-24 at 10:12 -0700, Stephan Hoyer wrote:
> On Fri, Apr 24, 2020 at 6:31 AM Sebastian Berg <
> sebast...@sipsolutions.net>
> wrote:
> 
> > One thing to note is that `__array__` is actually asked to return a
> > copy AFAIK.
> 
> The documentation on __array__ seems to quite limited, unfortunately.
> The
> most I can find are a few sentences here:
> https://numpy.org/doc/stable/reference/arrays.classes.html#numpy.class.__array__
> 
> I don't see anything about returning copies. My interpretation has
> always
> been that __array__ can return either a copy or a view, like the
> np.asarray() constructor.
> 

Hmmm, right, I am not quite sure why I thought this was the case.

The more important part is behaviour. And the fact is that if you do
`np.array(array_like)` with an array like that implements `__array__`
then we ensure a copy is made by default (`copy=True` by default), even
though `__array__()` may already return a copy.

In any case, the current default for `np.asarray`, i.e. `copy=False` is
"copy if necessary". So if PyTorch uses a new parameter to Opt-In to
copying, the default behaviour will depend on the object. The
definition would then be:

Copy if necessary but error if a copy is necessary and the
object doesn't want to be copied silently.

To be honest, that seems not totally terrible to me... The old
statement remains true with the small caveat that it will sometimes
cause a loud error explaining things. The only problem is that some
users may want an the explicit `np.copy_if_necessary` to get PyTorch to
do what most already do on `copy=False`.

I guess the new behaviour would then be:

if copy is np.never_copy:  # or however we signal it
try:
arr = obj.__array__(copy=np.no_copy)
except TypeError as e:
raise TypeError("no copy appears unsupported by ...!") from e
elif copy is np.copy_if_necessary:
# Some users may want to tell PyTorch not to error, but
# tell pandas, that a view is OK:
try:
arr = np.array(copy=np.copy_if_necessary)
except TypeError:
arr = obj.__array__()
elif not copy:
# Behaviour here may depend on the array-like!
# current array likes may or may not return a copy,
# new ones may choose to raise an error when a view
# is not possible.
arr = obj.__array__()
else:
try:
arr = obj.__array__(copy=True)
except TypeError:
arr = obj.__array__()
arr = arr.copy()  # make sure its a copy

PyTorch can then implement copy, but raise an error if `copy=False`
(which must be the default). Current objects will error for
`np.never_copy` but otherwise be fine. And they can implement `copy` to
avoid an unnecessary double copy if they wish so.
We could add the `np.copy_if_necessary` to be an explicit replacement
for the current `copy=False`. This will be necessary, or nicer, unless
everyone is happy to copy by default.

Another side note: calls such as `np.array([arr1, arr2])` probably must
always fail if `copy=np.never_copy` since a view is not guaranteed.

- Sebastian


> 
> > I doubt it always does, but if it does not I assume the
> > object should and could provide `__array_interface__`.
> > 
> 
> Objects like xarray.DataArray and pandas.Series sometimes directly
> wrap
> NumPy arrays and sometimes don't.
> 
> They both implement __array__ but not __array_inferace__. It's very
> obvious
> how to implement a "forwarding" __array__ method (just call
> `np.asarray()`
> on an argument that might implement it). I guess something similar
> could be
> done for __array_interface__, but it's not clear to me that it's
> right to
> implement __array_interface__ when doing so might require a copy.
> 

Yes, I do not think you should implement __array_interface__ then,
unless "simplifying the array" is for some reason beneficial for
yourself. I suppose you could raise an AttributeError, but it is
questionable if thats good.


> 
> > Under that assumption, it would be an opt-out right now since NumPy
> > allows copies by default here.
> > Defining things along copy does seem sensible, though I do not know
> > how
> > it would play with some of the current array-likes choosing to
> > refuse
> > `__array__`.
> > 
> > - Sebastian
> > 
> > 
> > 
> > > Eric
> > > 
> > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias <
> > > j...@fastmail.com>
> > > wrote:
> > > 
> > > > Hi everyone,
> > > > 
> > > > One bit of expressivity we would miss is “copy if necessary,
> > > > but
> > > > otherwise
> > > > > don’t bother”, but there are workarounds to this.
> > > > > 
> > > > 
> > > > After a side discussion with Stéfan van der Walt, we came up
> > > > with
> > > > `allow_copy=True`, which would express to the downstream
> > > > library
> > > > that we
> > > > don’t mind waiting, but that zero-copy would also be ok.
> > > > 
> > > > This sounds like the sort of thing that is use case driven. If
> > > > enough
> > > > projects want to use it, then I have no objections to adding
> > > > the
> 

Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-24 Thread Stephan Hoyer
On Fri, Apr 24, 2020 at 6:31 AM Sebastian Berg 
wrote:

> One thing to note is that `__array__` is actually asked to return a
> copy AFAIK.


The documentation on __array__ seems to quite limited, unfortunately. The
most I can find are a few sentences here:
https://numpy.org/doc/stable/reference/arrays.classes.html#numpy.class.__array__

I don't see anything about returning copies. My interpretation has always
been that __array__ can return either a copy or a view, like the
np.asarray() constructor.


> I doubt it always does, but if it does not I assume the
> object should and could provide `__array_interface__`.
>

Objects like xarray.DataArray and pandas.Series sometimes directly wrap
NumPy arrays and sometimes don't.

They both implement __array__ but not __array_inferace__. It's very obvious
how to implement a "forwarding" __array__ method (just call `np.asarray()`
on an argument that might implement it). I guess something similar could be
done for __array_interface__, but it's not clear to me that it's right to
implement __array_interface__ when doing so might require a copy.


> Under that assumption, it would be an opt-out right now since NumPy
> allows copies by default here.
> Defining things along copy does seem sensible, though I do not know how
> it would play with some of the current array-likes choosing to refuse
> `__array__`.
>
> - Sebastian
>
>
>
> > Eric
> >
> > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias 
> > wrote:
> >
> > > Hi everyone,
> > >
> > > One bit of expressivity we would miss is “copy if necessary, but
> > > otherwise
> > > > don’t bother”, but there are workarounds to this.
> > > >
> > >
> > > After a side discussion with Stéfan van der Walt, we came up with
> > > `allow_copy=True`, which would express to the downstream library
> > > that we
> > > don’t mind waiting, but that zero-copy would also be ok.
> > >
> > > This sounds like the sort of thing that is use case driven. If
> > > enough
> > > projects want to use it, then I have no objections to adding the
> > > keyword.
> > > OTOH, we need to be careful about adding too many interoperability
> > > tricks
> > > as they complicate the code and makes it hard for folks to
> > > determine the
> > > best solution. Interoperability is a hot topic and we need to be
> > > careful
> > > not put too leave behind too many experiments in the NumPy
> > > code.  Do you
> > > have any other ideas of how to achieve the same effect?
> > >
> > >
> > > Personally, I don’t have any other ideas, but would be happy to
> > > hear some!
> > >
> > > My view regarding API/experiment creep is that `__array__` is the
> > > oldest
> > > and most basic of all the interop tricks and that this can be
> > > safely
> > > maintained for future generations. Currently it only takes `dtype=`
> > > as a
> > > keyword argument, so it is a very lean API. I think this particular
> > > use
> > > case is very natural and I’ve encountered the reluctance to
> > > implicitly copy
> > > twice, so I expect it is reasonably common.
> > >
> > > Regarding difficulty in determining the best solution, I would be
> > > happy to
> > > contribute to the dispatch basics guide together with the new
> > > kwarg. I
> > > agree that the protocols are getting quite numerous and I couldn’t
> > > find a
> > > single place that gathers all the best practices together. But, to
> > > reiterate my point: `__array__` is the simplest of these and I
> > > think this
> > > keyword is pretty safe to add.
> > >
> > > For ease of discussion, here are the API options discussed so far,
> > > as well
> > > as a few extra that I don’t like but might trigger other ideas:
> > >
> > > np.asarray(my_duck_array, allow_copy=True)  # default is False, or
> > > None ->
> > > leave it to the duck array to decide
> > > np.asarray(my_duck_array, copy=True)  # always copies, but, if
> > > supported
> > > by the duck array, defers to it for the copy
> > > np.asarray(my_duck_array, copy=‘allow’)  # could take values
> > > ‘allow’,
> > > ‘force’, ’no’, True(=‘force’), False(=’no’)
> > > np.asarray(my_duck_array, force_copy=False, allow_copy=True)  #
> > > separate
> > > concepts, but unclear what force_copy=True, allow_copy=False means!
> > > np.asarray(my_duck_array, force=True)
> > >
> > > Juan.
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-24 Thread Sebastian Berg
On Fri, 2020-04-24 at 11:34 +0100, Eric Wieser wrote:
> Perhaps worth mentioning that we've discussed this sort of API
> before, in
> https://github.com/numpy/numpy/pull/11897.
> 
> Under that proposal, the api would be something like:
> 
> * `copy=True` - always copy, like it is today
> * `copy=False` - copy if needed, like it is today
> * `copy=np.never_copy` - never copy, throw an exception if not
> possible
> 
> I think the discussion stalled on the precise spelling of the third
> option.
> 
> `__array__` was not discussed there, but it seems like adding the
> `copy`
> argument to `__array__` would be a perfectly reasonable extension.
> 

One thing to note is that `__array__` is actually asked to return a
copy AFAIK. I doubt it always does, but if it does not I assume the
object should and could provide `__array_interface__`.
Under that assumption, it would be an opt-out right now since NumPy
allows copies by default here.
Defining things along copy does seem sensible, though I do not know how
it would play with some of the current array-likes choosing to refuse
`__array__`.

- Sebastian



> Eric
> 
> On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias 
> wrote:
> 
> > Hi everyone,
> > 
> > One bit of expressivity we would miss is “copy if necessary, but
> > otherwise
> > > don’t bother”, but there are workarounds to this.
> > > 
> > 
> > After a side discussion with Stéfan van der Walt, we came up with
> > `allow_copy=True`, which would express to the downstream library
> > that we
> > don’t mind waiting, but that zero-copy would also be ok.
> > 
> > This sounds like the sort of thing that is use case driven. If
> > enough
> > projects want to use it, then I have no objections to adding the
> > keyword.
> > OTOH, we need to be careful about adding too many interoperability
> > tricks
> > as they complicate the code and makes it hard for folks to
> > determine the
> > best solution. Interoperability is a hot topic and we need to be
> > careful
> > not put too leave behind too many experiments in the NumPy
> > code.  Do you
> > have any other ideas of how to achieve the same effect?
> > 
> > 
> > Personally, I don’t have any other ideas, but would be happy to
> > hear some!
> > 
> > My view regarding API/experiment creep is that `__array__` is the
> > oldest
> > and most basic of all the interop tricks and that this can be
> > safely
> > maintained for future generations. Currently it only takes `dtype=`
> > as a
> > keyword argument, so it is a very lean API. I think this particular
> > use
> > case is very natural and I’ve encountered the reluctance to
> > implicitly copy
> > twice, so I expect it is reasonably common.
> > 
> > Regarding difficulty in determining the best solution, I would be
> > happy to
> > contribute to the dispatch basics guide together with the new
> > kwarg. I
> > agree that the protocols are getting quite numerous and I couldn’t
> > find a
> > single place that gathers all the best practices together. But, to
> > reiterate my point: `__array__` is the simplest of these and I
> > think this
> > keyword is pretty safe to add.
> > 
> > For ease of discussion, here are the API options discussed so far,
> > as well
> > as a few extra that I don’t like but might trigger other ideas:
> > 
> > np.asarray(my_duck_array, allow_copy=True)  # default is False, or
> > None ->
> > leave it to the duck array to decide
> > np.asarray(my_duck_array, copy=True)  # always copies, but, if
> > supported
> > by the duck array, defers to it for the copy
> > np.asarray(my_duck_array, copy=‘allow’)  # could take values
> > ‘allow’,
> > ‘force’, ’no’, True(=‘force’), False(=’no’)
> > np.asarray(my_duck_array, force_copy=False, allow_copy=True)  #
> > separate
> > concepts, but unclear what force_copy=True, allow_copy=False means!
> > np.asarray(my_duck_array, force=True)
> > 
> > Juan.
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-24 Thread Eric Wieser
Perhaps worth mentioning that we've discussed this sort of API before, in
https://github.com/numpy/numpy/pull/11897.

Under that proposal, the api would be something like:

* `copy=True` - always copy, like it is today
* `copy=False` - copy if needed, like it is today
* `copy=np.never_copy` - never copy, throw an exception if not possible

I think the discussion stalled on the precise spelling of the third option.

`__array__` was not discussed there, but it seems like adding the `copy`
argument to `__array__` would be a perfectly reasonable extension.

Eric

On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias  wrote:

> Hi everyone,
>
> One bit of expressivity we would miss is “copy if necessary, but otherwise
>> don’t bother”, but there are workarounds to this.
>>
>
> After a side discussion with Stéfan van der Walt, we came up with
> `allow_copy=True`, which would express to the downstream library that we
> don’t mind waiting, but that zero-copy would also be ok.
>
> This sounds like the sort of thing that is use case driven. If enough
> projects want to use it, then I have no objections to adding the keyword.
> OTOH, we need to be careful about adding too many interoperability tricks
> as they complicate the code and makes it hard for folks to determine the
> best solution. Interoperability is a hot topic and we need to be careful
> not put too leave behind too many experiments in the NumPy code.  Do you
> have any other ideas of how to achieve the same effect?
>
>
> Personally, I don’t have any other ideas, but would be happy to hear some!
>
> My view regarding API/experiment creep is that `__array__` is the oldest
> and most basic of all the interop tricks and that this can be safely
> maintained for future generations. Currently it only takes `dtype=` as a
> keyword argument, so it is a very lean API. I think this particular use
> case is very natural and I’ve encountered the reluctance to implicitly copy
> twice, so I expect it is reasonably common.
>
> Regarding difficulty in determining the best solution, I would be happy to
> contribute to the dispatch basics guide together with the new kwarg. I
> agree that the protocols are getting quite numerous and I couldn’t find a
> single place that gathers all the best practices together. But, to
> reiterate my point: `__array__` is the simplest of these and I think this
> keyword is pretty safe to add.
>
> For ease of discussion, here are the API options discussed so far, as well
> as a few extra that I don’t like but might trigger other ideas:
>
> np.asarray(my_duck_array, allow_copy=True)  # default is False, or None ->
> leave it to the duck array to decide
> np.asarray(my_duck_array, copy=True)  # always copies, but, if supported
> by the duck array, defers to it for the copy
> np.asarray(my_duck_array, copy=‘allow’)  # could take values ‘allow’,
> ‘force’, ’no’, True(=‘force’), False(=’no’)
> np.asarray(my_duck_array, force_copy=False, allow_copy=True)  # separate
> concepts, but unclear what force_copy=True, allow_copy=False means!
> np.asarray(my_duck_array, force=True)
>
> Juan.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-23 Thread Juan Nunez-Iglesias
Hi everyone,

> One bit of expressivity we would miss is “copy if necessary, but otherwise 
> don’t bother”, but there are workarounds to this.

After a side discussion with Stéfan van der Walt, we came up with 
`allow_copy=True`, which would express to the downstream library that we don’t 
mind waiting, but that zero-copy would also be ok.

> This sounds like the sort of thing that is use case driven. If enough 
> projects want to use it, then I have no objections to adding the keyword. 
> OTOH, we need to be careful about adding too many interoperability tricks as 
> they complicate the code and makes it hard for folks to determine the best 
> solution. Interoperability is a hot topic and we need to be careful not put 
> too leave behind too many experiments in the NumPy code.  Do you have any 
> other ideas of how to achieve the same effect?

Personally, I don’t have any other ideas, but would be happy to hear some!

My view regarding API/experiment creep is that `__array__` is the oldest and 
most basic of all the interop tricks and that this can be safely maintained for 
future generations. Currently it only takes `dtype=` as a keyword argument, so 
it is a very lean API. I think this particular use case is very natural and 
I’ve encountered the reluctance to implicitly copy twice, so I expect it is 
reasonably common.

Regarding difficulty in determining the best solution, I would be happy to 
contribute to the dispatch basics guide together with the new kwarg. I agree 
that the protocols are getting quite numerous and I couldn’t find a single 
place that gathers all the best practices together. But, to reiterate my point: 
`__array__` is the simplest of these and I think this keyword is pretty safe to 
add.

For ease of discussion, here are the API options discussed so far, as well as a 
few extra that I don’t like but might trigger other ideas:

np.asarray(my_duck_array, allow_copy=True)  # default is False, or None -> 
leave it to the duck array to decide
np.asarray(my_duck_array, copy=True)  # always copies, but, if supported by the 
duck array, defers to it for the copy
np.asarray(my_duck_array, copy=‘allow’)  # could take values ‘allow’, ‘force’, 
’no’, True(=‘force’), False(=’no’)
np.asarray(my_duck_array, force_copy=False, allow_copy=True)  # separate 
concepts, but unclear what force_copy=True, allow_copy=False means!
np.asarray(my_duck_array, force=True)

Juan.___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-21 Thread Charles R Harris
On Tue, Apr 21, 2020 at 1:07 AM Juan Nunez-Iglesias 
wrote:

> Hello NumPy-ers!
>
> The __array__ method is a great little tool to allow interoperability with
> NumPy. Briefly, calling `np.array()` or `np.asarray()` on an object with an
> `__array__` method, one can get a NumPy representation of that object,
> which may or may not involve data copying (this is up to the object’s
> implementation of `__array__`). Some references:
>
> https://numpy.org/devdocs/user/basics.dispatch.html
>
>
> https://docs.scipy.org/doc/numpy/reference/arrays.classes.html#numpy.class.__array__
> https://numpy.org/devdocs/reference/generated/numpy.array.html
> https://numpy.org/devdocs/reference/generated/numpy.asarray.html
>
>
> (I couldn’t find an authoritative guide on good and bad practices with
> `__array__`, btw.)
>
> For people writing e.g. visualisation libraries, this is a wonderful
> thing, because if we know how to visualise NumPy arrays, we can suddenly
> visualise anything with an `__array__` method. As an example, napari, while
> not being aware of dask, can visualise large dask arrays out of the box,
> which allows us to view 100GB out-of-core datasets easily.
>
> However, in many cases, instantiating a NumPy array is an expensive
> operation, for example copying an array from GPU to CPU memory, or involves
> substantial loss of information. Some library authors are reluctant to
> allow implicit execution of such an operation, such as PyOpenCL [1],
> PyTorch [2], or even scipy.sparse.
>
> My proposal is to add an optional argument to `__array__` that would
> signal to the downstream library that we *really* want a NumPy array and
> are willing to wait for it. In the PyTorch issue I proposed `force=True`,
> and they are somewhat receptive of this, but, reading more about the
> existing NumPy APIs, I think `copy=True` would be a nice alternative:
>
> - np.array already has a copy= keyword argument. Under this proposal, it
> would attempt to pass it to the downstream library, and, if that failed, it
> would try again without it and run its own copy.
> - np.asarray could get a new copy= keyword argument that would match
> np.array’s.
> - It would neatly express the idea that the array is going to e.g. get
> passed around between devices.
>
> Or, we could just go with `force=`.
>
> One bit of expressivity we would miss is “copy if necessary, but otherwise
> don’t bother”, but there are workarounds to this.
>
> What do people think? I would be happy to write a PR and/or NEP for this
> if there is general consensus that this would be useful.
>
>
This sounds like the sort of thing that is use case driven. If enough
projects want to use it, then I have no objections to adding the keyword.
OTOH, we need to be careful about adding too many interoperability tricks
as they complicate the code and makes it hard for folks to determine the
best solution. Interoperability is a hot topic and we need to be careful
not put too leave behind too many experiments in the NumPy code.  Do you
have any other ideas of how to achieve the same effect?

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion