Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-29 Thread Sebastian Berg
On Wed, 2020-04-29 at 05:26 -0500, Juan Nunez-Iglesias wrote:
> Hi everyone, and thank you Ralf for carrying the flag in my absence.
> =D
> 
> Sebastian, the *primary* motivation behind avoiding detach() in
> PyTorch is listed in original post of the PyTorch issue:
> 
> > People not very familiar with `requires_grad` and cpu/gpu Tensors
> > might go back and forth with numpy. For example doing pytorch ->
> > numpy -> pytorch and backward on the last Tensor. This will
> > backward without issue but not all the way to the first part of the
> > code and won’t raise any error.
> 
> The PyTorch team are concerned that they will be overwhelmed with
> help requests if np.array() silently succeeds on a tensor with
> gradients. I definitely get that.

Sorry for playing advocatus diaboli...

I guess it is simply that before the end, it would be nice to have a
short list with projects:

* Napari, matplotlib on the "user" side
* PyTorch, ...? on the "provider" side

And maybe what their expectations on `force=True` are, to make sure
they roughly align.

The best definition for when to use `force=True` at this time seems to
be "end-point" users (such as visualization or maybe writing to disk?).

I still think performance can be just as valid of an issue there. For
example it may be better to convert to a numpy array earlier in the
computation.  Or someone could be surprised that saving their gpu array
to an hdf5 file is by far the slowest part of the computation.

Maybe I have the feeling the definition we want is actually:

   There is definitely no way to do this computation faster or better
   than by converting it to a NumPy array.

Since currently the main reason to reject it seems a bit to be:

   Wait, are you sure there is not a much better way than using NumPy
   arrays, be careful!

And while that distinction is clear for PyTorch + visualization, I am
not quite sure yet, that it is clear for various combinations of
`force=True` and array-like users.
Maybe CuPy does not want h5py to use `force=True`, because cupy has its
own super fast "stream-to-file" functionality... But it wants to to do
it for napari.

- Sebastian


> 
> Avoiding .gpu() is more straightforwardly about avoiding implicit
> expensive computation.
> 
> > while others do not choose to teach about it. There seems very
> > little
> > or even no "promise" attached to either `force=True` or
> > `force=False`.
> 
> NumPy can set a precedent through policy. The *only* reason client
> libraries would implement `__array__` is to play well with NumPy, so
> if NumPy documents that `force=True` should *always* succeed, we can
> expect client libraries to follow suit. At least the PyTorch devs
> have indicated that they would be open to this.
> 
> > E.g. Napari wants to use it, but do the array-providers want Napari
> > to use it?
> 
> As Ralf pointed out, the PyTorch devs have already agreed to it.
> 
> From the napari perspective, we'd be ok with leaving the decision on
> warnings to client libraries. We may or may not suppress them
> depending on user requests. ;) But the point is to have a way of
> saying "give me a NumPy array DAMMIT" without having to know about
> all the possible array libraries. Which are numerous and getting
> numerouser.
> 
> Ralf, you said you don't want warnings — even for sparse arrays? That
> was an area of concern for you on the PyTorch discussion side.
> 
> > And if the conversion still gives warnings for some array-objects,
> > have we actually gained much?
> 
> Yes.
> 
> Hameer,
> 
> > I would advocate for a `force=` kwarg but personally I don't think
> > it's explicit enough, but probably as explicit as can be given
> > NumPy's API.
> 
> Yeah, I agree that force is kind of vague, which is why I was looking
> for things like `allow_copy`. But it is hard to be general enough
> here: sparse requires an expensive instantiation, cupy requires
> copying from gpu to cpu, dask requires arbitrary computation, xarray
> requires information loss... I'm inclined to agree with Ralf that
> force= is the only generic-enough term, but I'm happy to entertain
> other options!
> 
> Juan.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] 1.19.x branch

2020-04-29 Thread Charles R Harris
Hi All,

I thinking of making the 1.19.x branch in about two weeks. If there are PRs
that you feel need to be in that release please let me know.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-29 Thread Ralf Gommers
On Wed, Apr 29, 2020 at 12:27 PM Juan Nunez-Iglesias 
wrote:

>
>
> Ralf, you said you don't want warnings — even for sparse arrays? That was
> an area of concern for you on the PyTorch discussion side.
>

Providing a boolean keyword argument and then raising a warning every time
anyone uses that keyword makes very little sense.

This should simply be handled by clear documentation: use only in code like
visualization where the array arrives at its "end station", never in
functions that are again built on top of.

Cheers,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-29 Thread Juan Nunez-Iglesias
Hi everyone, and thank you Ralf for carrying the flag in my absence. =D

Sebastian, the *primary* motivation behind avoiding detach() in PyTorch is 
listed in original post of the PyTorch issue:

> People not very familiar with `requires_grad` and cpu/gpu Tensors might go 
> back and forth with numpy. For example doing pytorch -> numpy -> pytorch and 
> backward on the last Tensor. This will backward without issue but not all the 
> way to the first part of the code and won’t raise any error.


The PyTorch team are concerned that they will be overwhelmed with help requests 
if np.array() silently succeeds on a tensor with gradients. I definitely get 
that.

Avoiding .gpu() is more straightforwardly about avoiding implicit expensive 
computation.

> while others do not choose to teach about it. There seems very little
> or even no "promise" attached to either `force=True` or `force=False`.

NumPy can set a precedent through policy. The *only* reason client libraries 
would implement `__array__` is to play well with NumPy, so if NumPy documents 
that `force=True` should *always* succeed, we can expect client libraries to 
follow suit. At least the PyTorch devs have indicated that they would be open 
to this.

> E.g. Napari wants to use it, but do the array-providers want Napari to use it?

As Ralf pointed out, the PyTorch devs have already agreed to it.

>From the napari perspective, we'd be ok with leaving the decision on warnings 
>to client libraries. We may or may not suppress them depending on user 
>requests. ;) But the point is to have a way of saying "give me a NumPy array 
>DAMMIT" without having to know about all the possible array libraries. Which 
>are numerous and getting numerouser.

Ralf, you said you don't want warnings — even for sparse arrays? That was an 
area of concern for you on the PyTorch discussion side.

> And if the conversion still gives warnings for some array-objects, have we 
> actually gained much?

Yes.

Hameer,

> I would advocate for a `force=` kwarg but personally I don't think it's 
> explicit enough, but probably as explicit as can be given NumPy's API.

Yeah, I agree that force is kind of vague, which is why I was looking for 
things like `allow_copy`. But it is hard to be general enough here: sparse 
requires an expensive instantiation, cupy requires copying from gpu to cpu, 
dask requires arbitrary computation, xarray requires information loss... I'm 
inclined to agree with Ralf that force= is the only generic-enough term, but 
I'm happy to entertain other options!

Juan.___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-29 Thread Ralf Gommers
On Tue, Apr 28, 2020 at 5:03 PM Sebastian Berg 
wrote:

> On Tue, 2020-04-28 at 11:51 +0200, Ralf Gommers wrote:
> 
> > > So arguably, there is no type-safety concern due to `.detach()`.
> >
> > I'm not sure what the question is here; no one mentioned type-safety.
> > The
> > PyTorch maintainers have already said they're fine with adding a
> > force
> > keyword.
>
> But type-safety is the reason to distinguish between:
>
> * np.asarrau(tensor)
> * np.asarray(tensor, force=True)
>

No it's not, the rationale given by library authors is expensive conversion
/ memory copies / side effects. `np.asarray(x)` is used all over the place,
and can/will continue to be used by library authors. `force=True` is for
cases where things like expensive conversion don't matter, like
visualization - if you need a picture of an array then it helps, while the
downside of writing inefficient/unreliable numerical code isn't present.


> Similar to:
>
> * operator.index(obj)
> * int(obj)   # convert less type-safe (strings, floats)!
>
> I actually mentioned 3 reasons in my email:
>
> 1. Teach and Inform users (about the next two mainly)
> 2. Type-safety
> 3. Expensive conversion
>
> And only type-safety is related to `.detach()` mentioning that there
> may not be clear story about the usage in that case.
>
> (continued below)
>
> >
> 
> > >
> > >
> > > I am very much in favor of adding such things, but I still lack a
> > > bit
> > > of clarity as to whom we would be helping?
> > >
> >
> > See Juan's first email. I personally am ambivalent on this proposal,
> > but if
> > Juan and the Napari devs really want it, that's good enough for me.
>
> Of course I read it, twice, but it is only good enough for me if we
> actually *solve the issue*, and for that I want to know which issue we
> are solving :), it seems obvious, but I am not so sure...
>
> That brings us to the other two reasons:
>
> Teaching and Informing users:
>
> If Napari uses `force=True` indiscriminately, it is not very clear to
> the user about whether or not the operation is expensive.  I.e. the
> user can learn it is when using `np.asarray(sparse_arr)` with other
> libraries. But they are not notified that `napari.vis_func(sparse_arr)`
> might kill their computer.
>
> So the "Teaching" part can still partially work, but it does not inform
> the user well anymore on whether or not a function will blow-up memory.
>
> Expensive Conversion:
>
> If the main reason is expensive conversions, however, than, as a
> library I would probably just use it for half my API, since copying
> from GPU to CPU will still be much faster than my own function.
>
>
> Generally:
>
> I want to help Napari, but it seems like there may be more to this, and
> it may be good to finish these thoughts before making a call.
>
> E.g. Napari wants to use it, but do the array-providers want Napari to
> use it?
>
> For sparse Hameer just mentioned that he still would want big warnings
> both during the operation and in the `np.asarray` documentation.
> If we put such big warnings there, we should have an idea of who we
> want to ignore that warning? (Napari yes, sklearn sometimes, ...?)
>

There clearly should not be warnings. And sklearn is irrelevant, it cannot
use `force=True`.

Ralf



>-> Is "whatever the library feels right" good enough?
>
> And if the conversion still gives warnings for some array-objects, have
> we actually gained much?
>
>   -> Maybe we do, end-users may be happy to ignore those warnings...
>
>
> The one clear use-case for `force=True` is the end-user. Just like no
> library uses `int(obj)`, but end-users can use it very nicely.
> I am happy to help the end-user in this case, but if that is the target
> audience we may want to _discourage_ Napari from using `force=True` and
> encourage sparse not to put any RuntimeWarnings on it!
>
> - Sebastian
>
>
> > Cheers,
> > Ralf
> >
> >
> >
> > > If end-users will actually use `np.asarray(..., force=True)` over
> > > special methods, then great! But I am currently not sure the type-
> > > safety argument is all that big of a point.  And the performance or
> > > memory-blowup argument remains true even for visualization
> > > libraries
> > > (where the array is purely input and never output as such).
> > >
> > >
> > > But yes, "never copy" is a somewhat different extension to
> > > `__array__`
> > > and `np.asarray`. It guarantees high speed and in-place behaviour
> > > which
> > > is useful for different settings.
> > >
> > > - Sebastian
> > >
> > >
> > > > > Cheers,
> > > > > Ralf
> > > > >
> > > > >
> > > > > > I think the discussion stalled on the precise spelling of the
> > > > > > third
> > > > > > option.
> > > > > >
> > > > > > `__array__` was not discussed there, but it seems like adding
> > > > > > the
> > > > > > `copy`
> > > > > > argument to `__array__` would be a perfectly reasonable
> > > > > > extension.
> > > > > >
> > > > > > Eric
> > > > > >
> > > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias <
> > > > >