Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-26 Thread Sebastian Berg
On Sat, 2020-04-25 at 10:52 -0700, Stephan Hoyer wrote:
> On Sat, Apr 25, 2020 at 10:40 AM Ralf Gommers  >
> wrote:
> 
> > 
> > On Fri, Apr 24, 2020 at 12:35 PM Eric Wieser <
> > wieser.eric+nu...@gmail.com>
> > wrote:
> > 
> > > Perhaps worth mentioning that we've discussed this sort of API
> > > before, in
> > > https://github.com/numpy/numpy/pull/11897.
> > > 
> > > Under that proposal, the api would be something like:
> > > 
> > > * `copy=True` - always copy, like it is today
> > > * `copy=False` - copy if needed, like it is today
> > > * `copy=np.never_copy` - never copy, throw an exception if not
> > > possible
> > > 
> > 
> > There's a couple of issues I see with using `copy` for __array__:
> > - copy is already weird (False doesn't mean no), and a [bool,
> > some_obj_or_str] keyword isn't making that better
> > - the behavior we're talking about can do more than copying, e.g.
> > for
> > PyTorch it would modify the autograd graph by adding detach(), and
> > for
> > sparse it's not just "make a copy" (which implies doubling memory
> > use) but
> > it densifies which can massively blow up the memory.
> > - I'm -1 on adding things to the main namespace (never_copy) for
> > something
> > that can be handled differently (like a string, or a new keyword)
> > 
> > tl;dr a new `force` keyword would be better
> > 
> 
> I agree, “copy” is not a good description of this desired coercion
> behavior.
> 
> A new keyword argument like “force” would be much clearer.
> 

That seems fine and practical. But, in the end it seems to me that the
`force=` keyword, just means that some projects want to teach their
users that:

1. `np.asarray()` can be expensive (and may always copy)
2. `np.asarray()` always loses type properties

while others do not choose to teach about it.  There seems very little
or even no "promise" attached to either `force=True` or `force=False`.


In the end, the question is whether sparse will actually want to
implement `force=True` if the main reason we add is for library use.
There is no difference between a visualization library or numpy. In
both cases the users memory will blow up just the same.

As for PyTorch, is `.detach()` even a good reason?  Maybe I am missing
things, but:

>>> torch.ones(10, requires_grad=True) + np.arange(10)
# RuntimeError: Can't call numpy() on Variable that requires grad. Use
var.detach().numpy() instead.

So arguably, there is no type-safety concern due to `.detach()`. There
is an (obvious) general loss of type information that always occurs
with an `np.asarray` call.
But I do not see that creating any openings for bugs here, due to the
wisdom of not allowing the above operation.
In fact, it actually seems much worse for for xarray, or pandas. They
do support the above operation and will potentially mess up if the
arange was previously an xarray with a matching index, but different
order.


I am very much in favor of adding such things, but I still lack a bit
of clarity as to whom we would be helping?

If end-users will actually use `np.asarray(..., force=True)` over
special methods, then great! But I am currently not sure the type-
safety argument is all that big of a point.  And the performance or
memory-blowup argument remains true even for visualization libraries
(where the array is purely input and never output as such).


But yes, "never copy" is a somewhat different extension to `__array__`
and `np.asarray`. It guarantees high speed and in-place behaviour which
is useful for different settings.

- Sebastian


> 
> > Cheers,
> > Ralf
> > 
> > 
> > > I think the discussion stalled on the precise spelling of the
> > > third
> > > option.
> > > 
> > > `__array__` was not discussed there, but it seems like adding the
> > > `copy`
> > > argument to `__array__` would be a perfectly reasonable
> > > extension.
> > > 
> > > Eric
> > > 
> > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias <
> > > j...@fastmail.com>
> > > wrote:
> > > 
> > > > Hi everyone,
> > > > 
> > > > One bit of expressivity we would miss is “copy if necessary,
> > > > but
> > > > > otherwise don’t bother”, but there are workarounds to this.
> > > > > 
> > > > 
> > > > After a side discussion with Stéfan van der Walt, we came up
> > > > with
> > > > `allow_copy=True`, which would express to the downstream
> > > > library that we
> > > > don’t mind waiting, but that zero-copy would also be ok.
> > > > 
> > > > This sounds like the sort of thing that is use case driven. If
> > > > enough
> > > > projects want to use it, then I have no objections to adding
> > > > the keyword.
> > > > OTOH, we need to be careful about adding too many
> > > > interoperability tricks
> > > > as they complicate the code and makes it hard for folks to
> > > > determine the
> > > > best solution. Interoperability is a hot topic and we need to
> > > > be careful
> > > > not put too leave behind too many experiments in the NumPy
> > > > code.  Do you
> > > > have any other ideas of how to achieve the same effect?
> > > 

Re: [Numpy-discussion] Feelings about type aliases in NumPy

2020-04-26 Thread Joshua Wilson
To try and add some more data points to the conversation:

> Maybe we can go for a bit more distant name like "numpy.annotations" or 
> whatever.

Interestingly this was proposed independently here:

https://github.com/numpy/numpy-stubs/pull/66#issuecomment-619131274

Related to that, Ralf was opposed to numpy.typing because it would
shadow a stdlib module name:

https://github.com/numpy/numpy-stubs/pull/66#issuecomment-619123629

But, types is _also_ a stdlib module name. Maybe the above points give
some extra weight to "numpy.annotations"?

> Unless we anticipate adding a long list of type aliases (more than the three 
> suggested so far)

While working on some types in SciPy here:

https://github.com/scipy/scipy/pull/11936#discussion_r415280894

we ran into the issue of typing things that are "integer types" or
"floating types". For the time being we just inlined a definition like
Union[float, np.floating], but conceivably we would want to unify
those definitions somewhere instead of redefining them in every
project. (Note that existing types like SupportsInt etc. were not what
we wanted.) This perhaps suggests that the ultimate number of type
aliases might be larger than we initially thought.

On Sun, Apr 26, 2020 at 6:25 AM Ilhan Polat  wrote:
>
> I agree that parking all these in a secondary namespace sounds a better 
> option, can't say that I feel for the word "typing" though. There are already 
> too many type, dtype, ctypeslib etc. Maybe we can go for a bit more distant 
> name like "numpy.annotations" or whatever.
>
> On Sat, Apr 25, 2020 at 8:51 AM Kevin Sheppard  
> wrote:
>>
>> Typing is for library developers more than end users. I would also worry 
>> that putting it into the top level might discourage other typing classes 
>> since it is more difficult to add to the top level than to a lower level 
>> module. np.typing seems very clear to me.
>>
>> On Sat, Apr 25, 2020, 07:41 Stephan Hoyer  wrote:
>>>
>>>
>>>
>>> On Fri, Apr 24, 2020 at 11:31 AM Sebastian Berg 
>>>  wrote:

 On Fri, 2020-04-24 at 11:10 -0700, Stefan van der Walt wrote:
 > On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote:
 > > But, Stephan pointed out that it might be confusing to users for
 > > objects to only exist at typing time, so we came around to the
 > > question of whether people are open to the idea of including the
 > > type
 > > aliases in NumPy itself. Ralf's concrete proposal was to make a
 > > module
 > > numpy.types (or maybe numpy.typing) to hold the aliases so that
 > > they
 > > don't pollute the top-level namespace. The module would initially
 > > contain the types
 >
 > That sounds very sensible.  Having types available with NumPy should
 > also encourage their use, especially if we can add some documentation
 > around it.

 I agree, I might have a small tendency for `numpy.types` if we ever
 find any usage other than direct typing that may be the better name?
>>>
>>>
>>> Unless we anticipate adding a long list of type aliases (more than the 
>>> three suggested so far), I would lean towards adding ArrayLike to the top 
>>> level NumPy namespace as np.ArrayLike.
>>>
>>> Type annotations are becoming an increasingly core part of modern Python 
>>> code. We should make it easy to appropriately type check functions that act 
>>> on NumPy arrays, and a top level np.ArrayLike is definitely more convenient 
>>> than np.types.ArrayLike.
>>>
 Out of curiousity, I guess `ArrayLike` would be an ABC that a
 downstream project can register with?
>>>
>>>
>>> ArrayLike will be a typing Protocol, automatically recognizing attributes 
>>> like __array__ to indicate that something can be cast to an array.
>>>


 - Sebastian


 >
 > Stéfan
 > ___
 > NumPy-Discussion mailing list
 > NumPy-Discussion@python.org
 > https://mail.python.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@python.org
 https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Feelings about type aliases in NumPy

2020-04-26 Thread Ilhan Polat
I agree that parking all these in a secondary namespace sounds a better
option, can't say that I feel for the word "typing" though. There are
already too many type, dtype, ctypeslib etc. Maybe we can go for a bit more
distant name like "numpy.annotations" or whatever.

On Sat, Apr 25, 2020 at 8:51 AM Kevin Sheppard 
wrote:

> Typing is for library developers more than end users. I would also worry
> that putting it into the top level might discourage other typing classes
> since it is more difficult to add to the top level than to a lower level
> module. np.typing seems very clear to me.
>
> On Sat, Apr 25, 2020, 07:41 Stephan Hoyer  wrote:
>
>>
>>
>> On Fri, Apr 24, 2020 at 11:31 AM Sebastian Berg <
>> sebast...@sipsolutions.net> wrote:
>>
>>> On Fri, 2020-04-24 at 11:10 -0700, Stefan van der Walt wrote:
>>> > On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote:
>>> > > But, Stephan pointed out that it might be confusing to users for
>>> > > objects to only exist at typing time, so we came around to the
>>> > > question of whether people are open to the idea of including the
>>> > > type
>>> > > aliases in NumPy itself. Ralf's concrete proposal was to make a
>>> > > module
>>> > > numpy.types (or maybe numpy.typing) to hold the aliases so that
>>> > > they
>>> > > don't pollute the top-level namespace. The module would initially
>>> > > contain the types
>>> >
>>> > That sounds very sensible.  Having types available with NumPy should
>>> > also encourage their use, especially if we can add some documentation
>>> > around it.
>>>
>>> I agree, I might have a small tendency for `numpy.types` if we ever
>>> find any usage other than direct typing that may be the better name?
>>
>>
>> Unless we anticipate adding a long list of type aliases (more than the
>> three suggested so far), I would lean towards adding ArrayLike to the top
>> level NumPy namespace as np.ArrayLike.
>>
>> Type annotations are becoming an increasingly core part of modern Python
>> code. We should make it easy to appropriately type check functions that act
>> on NumPy arrays, and a top level np.ArrayLike is definitely more convenient
>> than np.types.ArrayLike.
>>
>> Out of curiousity, I guess `ArrayLike` would be an ABC that a
>>> downstream project can register with?
>>
>>
>> ArrayLike will be a typing Protocol, automatically recognizing attributes
>> like __array__ to indicate that something can be cast to an array.
>>
>>
>>>
>>> - Sebastian
>>>
>>>
>>> >
>>> > Stéfan
>>> > ___
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion@python.org
>>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion