[Numpy-discussion] beginner introduction to group

2020-04-24 Thread Tina Oberoi
Hi Everyone,
I am new to contributing to numpy. I have read the contributors guide and
done with the set-up. Hope to make some good contributions and also to
connect with all you great people in the numpy community.
Any suggestions and tips are always welcome.

Thanks and Regards
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Feelings about type aliases in NumPy

2020-04-24 Thread Sebastian Berg
On Fri, 2020-04-24 at 11:10 -0700, Stefan van der Walt wrote:
> On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote:
> > But, Stephan pointed out that it might be confusing to users for
> > objects to only exist at typing time, so we came around to the
> > question of whether people are open to the idea of including the
> > type
> > aliases in NumPy itself. Ralf's concrete proposal was to make a
> > module
> > numpy.types (or maybe numpy.typing) to hold the aliases so that
> > they
> > don't pollute the top-level namespace. The module would initially
> > contain the types
> 
> That sounds very sensible.  Having types available with NumPy should
> also encourage their use, especially if we can add some documentation
> around it.

I agree, I might have a small tendency for `numpy.types` if we ever
find any usage other than direct typing that may be the better name?

Out of curiousity, I guess `ArrayLike` would be an ABC that a
downstream project can register with?

- Sebastian


> 
> Stéfan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-24 Thread Sebastian Berg
On Fri, 2020-04-24 at 10:12 -0700, Stephan Hoyer wrote:
> On Fri, Apr 24, 2020 at 6:31 AM Sebastian Berg <
> sebast...@sipsolutions.net>
> wrote:
> 
> > One thing to note is that `__array__` is actually asked to return a
> > copy AFAIK.
> 
> The documentation on __array__ seems to quite limited, unfortunately.
> The
> most I can find are a few sentences here:
> https://numpy.org/doc/stable/reference/arrays.classes.html#numpy.class.__array__
> 
> I don't see anything about returning copies. My interpretation has
> always
> been that __array__ can return either a copy or a view, like the
> np.asarray() constructor.
> 

Hmmm, right, I am not quite sure why I thought this was the case.

The more important part is behaviour. And the fact is that if you do
`np.array(array_like)` with an array like that implements `__array__`
then we ensure a copy is made by default (`copy=True` by default), even
though `__array__()` may already return a copy.

In any case, the current default for `np.asarray`, i.e. `copy=False` is
"copy if necessary". So if PyTorch uses a new parameter to Opt-In to
copying, the default behaviour will depend on the object. The
definition would then be:

Copy if necessary but error if a copy is necessary and the
object doesn't want to be copied silently.

To be honest, that seems not totally terrible to me... The old
statement remains true with the small caveat that it will sometimes
cause a loud error explaining things. The only problem is that some
users may want an the explicit `np.copy_if_necessary` to get PyTorch to
do what most already do on `copy=False`.

I guess the new behaviour would then be:

if copy is np.never_copy:  # or however we signal it
try:
arr = obj.__array__(copy=np.no_copy)
except TypeError as e:
raise TypeError("no copy appears unsupported by ...!") from e
elif copy is np.copy_if_necessary:
# Some users may want to tell PyTorch not to error, but
# tell pandas, that a view is OK:
try:
arr = np.array(copy=np.copy_if_necessary)
except TypeError:
arr = obj.__array__()
elif not copy:
# Behaviour here may depend on the array-like!
# current array likes may or may not return a copy,
# new ones may choose to raise an error when a view
# is not possible.
arr = obj.__array__()
else:
try:
arr = obj.__array__(copy=True)
except TypeError:
arr = obj.__array__()
arr = arr.copy()  # make sure its a copy

PyTorch can then implement copy, but raise an error if `copy=False`
(which must be the default). Current objects will error for
`np.never_copy` but otherwise be fine. And they can implement `copy` to
avoid an unnecessary double copy if they wish so.
We could add the `np.copy_if_necessary` to be an explicit replacement
for the current `copy=False`. This will be necessary, or nicer, unless
everyone is happy to copy by default.

Another side note: calls such as `np.array([arr1, arr2])` probably must
always fail if `copy=np.never_copy` since a view is not guaranteed.

- Sebastian


> 
> > I doubt it always does, but if it does not I assume the
> > object should and could provide `__array_interface__`.
> > 
> 
> Objects like xarray.DataArray and pandas.Series sometimes directly
> wrap
> NumPy arrays and sometimes don't.
> 
> They both implement __array__ but not __array_inferace__. It's very
> obvious
> how to implement a "forwarding" __array__ method (just call
> `np.asarray()`
> on an argument that might implement it). I guess something similar
> could be
> done for __array_interface__, but it's not clear to me that it's
> right to
> implement __array_interface__ when doing so might require a copy.
> 

Yes, I do not think you should implement __array_interface__ then,
unless "simplifying the array" is for some reason beneficial for
yourself. I suppose you could raise an AttributeError, but it is
questionable if thats good.


> 
> > Under that assumption, it would be an opt-out right now since NumPy
> > allows copies by default here.
> > Defining things along copy does seem sensible, though I do not know
> > how
> > it would play with some of the current array-likes choosing to
> > refuse
> > `__array__`.
> > 
> > - Sebastian
> > 
> > 
> > 
> > > Eric
> > > 
> > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias <
> > > j...@fastmail.com>
> > > wrote:
> > > 
> > > > Hi everyone,
> > > > 
> > > > One bit of expressivity we would miss is “copy if necessary,
> > > > but
> > > > otherwise
> > > > > don’t bother”, but there are workarounds to this.
> > > > > 
> > > > 
> > > > After a side discussion with Stéfan van der Walt, we came up
> > > > with
> > > > `allow_copy=True`, which would express to the downstream
> > > > library
> > > > that we
> > > > don’t mind waiting, but that zero-copy would also be ok.
> > > > 
> > > > This sounds like the sort of thing that is use case driven. If
> > > > enough
> > > > projects want to use it, then I have no objections to adding
> > > > the
> 

Re: [Numpy-discussion] Feelings about type aliases in NumPy

2020-04-24 Thread Stefan van der Walt
On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote:
> But, Stephan pointed out that it might be confusing to users for
> objects to only exist at typing time, so we came around to the
> question of whether people are open to the idea of including the type
> aliases in NumPy itself. Ralf's concrete proposal was to make a module
> numpy.types (or maybe numpy.typing) to hold the aliases so that they
> don't pollute the top-level namespace. The module would initially
> contain the types

That sounds very sensible.  Having types available with NumPy should also 
encourage their use, especially if we can add some documentation around it.

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-24 Thread Stephan Hoyer
On Fri, Apr 24, 2020 at 6:31 AM Sebastian Berg 
wrote:

> One thing to note is that `__array__` is actually asked to return a
> copy AFAIK.


The documentation on __array__ seems to quite limited, unfortunately. The
most I can find are a few sentences here:
https://numpy.org/doc/stable/reference/arrays.classes.html#numpy.class.__array__

I don't see anything about returning copies. My interpretation has always
been that __array__ can return either a copy or a view, like the
np.asarray() constructor.


> I doubt it always does, but if it does not I assume the
> object should and could provide `__array_interface__`.
>

Objects like xarray.DataArray and pandas.Series sometimes directly wrap
NumPy arrays and sometimes don't.

They both implement __array__ but not __array_inferace__. It's very obvious
how to implement a "forwarding" __array__ method (just call `np.asarray()`
on an argument that might implement it). I guess something similar could be
done for __array_interface__, but it's not clear to me that it's right to
implement __array_interface__ when doing so might require a copy.


> Under that assumption, it would be an opt-out right now since NumPy
> allows copies by default here.
> Defining things along copy does seem sensible, though I do not know how
> it would play with some of the current array-likes choosing to refuse
> `__array__`.
>
> - Sebastian
>
>
>
> > Eric
> >
> > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias 
> > wrote:
> >
> > > Hi everyone,
> > >
> > > One bit of expressivity we would miss is “copy if necessary, but
> > > otherwise
> > > > don’t bother”, but there are workarounds to this.
> > > >
> > >
> > > After a side discussion with Stéfan van der Walt, we came up with
> > > `allow_copy=True`, which would express to the downstream library
> > > that we
> > > don’t mind waiting, but that zero-copy would also be ok.
> > >
> > > This sounds like the sort of thing that is use case driven. If
> > > enough
> > > projects want to use it, then I have no objections to adding the
> > > keyword.
> > > OTOH, we need to be careful about adding too many interoperability
> > > tricks
> > > as they complicate the code and makes it hard for folks to
> > > determine the
> > > best solution. Interoperability is a hot topic and we need to be
> > > careful
> > > not put too leave behind too many experiments in the NumPy
> > > code.  Do you
> > > have any other ideas of how to achieve the same effect?
> > >
> > >
> > > Personally, I don’t have any other ideas, but would be happy to
> > > hear some!
> > >
> > > My view regarding API/experiment creep is that `__array__` is the
> > > oldest
> > > and most basic of all the interop tricks and that this can be
> > > safely
> > > maintained for future generations. Currently it only takes `dtype=`
> > > as a
> > > keyword argument, so it is a very lean API. I think this particular
> > > use
> > > case is very natural and I’ve encountered the reluctance to
> > > implicitly copy
> > > twice, so I expect it is reasonably common.
> > >
> > > Regarding difficulty in determining the best solution, I would be
> > > happy to
> > > contribute to the dispatch basics guide together with the new
> > > kwarg. I
> > > agree that the protocols are getting quite numerous and I couldn’t
> > > find a
> > > single place that gathers all the best practices together. But, to
> > > reiterate my point: `__array__` is the simplest of these and I
> > > think this
> > > keyword is pretty safe to add.
> > >
> > > For ease of discussion, here are the API options discussed so far,
> > > as well
> > > as a few extra that I don’t like but might trigger other ideas:
> > >
> > > np.asarray(my_duck_array, allow_copy=True)  # default is False, or
> > > None ->
> > > leave it to the duck array to decide
> > > np.asarray(my_duck_array, copy=True)  # always copies, but, if
> > > supported
> > > by the duck array, defers to it for the copy
> > > np.asarray(my_duck_array, copy=‘allow’)  # could take values
> > > ‘allow’,
> > > ‘force’, ’no’, True(=‘force’), False(=’no’)
> > > np.asarray(my_duck_array, force_copy=False, allow_copy=True)  #
> > > separate
> > > concepts, but unclear what force_copy=True, allow_copy=False means!
> > > np.asarray(my_duck_array, force=True)
> > >
> > > Juan.
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Feelings about type aliases in NumPy

2020-04-24 Thread Joshua Wilson
Hey everyone,

Over in numpy-stubs we've been working on typing "array like":

https://github.com/numpy/numpy-stubs/pull/66

It would be nice if the type were public so that downstream projects
could use it (e.g. it would be very helpful in SciPy). Originally the
plan was to only make it publicly available at typing time and not
runtime, which would mean that no changes to NumPy are necessary; see

https://github.com/numpy/numpy-stubs/pull/66#issuecomment-618784833

for more information on how that works.

But, Stephan pointed out that it might be confusing to users for
objects to only exist at typing time, so we came around to the
question of whether people are open to the idea of including the type
aliases in NumPy itself. Ralf's concrete proposal was to make a module
numpy.types (or maybe numpy.typing) to hold the aliases so that they
don't pollute the top-level namespace. The module would initially
contain the types

- ArrayLike
- DtypeLike
- (maybe) ShapeLike

Note that we would not need to make changes to NumPy right away;
instead it would probably be done when numpy-stubs is merged into
NumPy itself.

What do people think?

- Josh
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-24 Thread Sebastian Berg
On Fri, 2020-04-24 at 11:34 +0100, Eric Wieser wrote:
> Perhaps worth mentioning that we've discussed this sort of API
> before, in
> https://github.com/numpy/numpy/pull/11897.
> 
> Under that proposal, the api would be something like:
> 
> * `copy=True` - always copy, like it is today
> * `copy=False` - copy if needed, like it is today
> * `copy=np.never_copy` - never copy, throw an exception if not
> possible
> 
> I think the discussion stalled on the precise spelling of the third
> option.
> 
> `__array__` was not discussed there, but it seems like adding the
> `copy`
> argument to `__array__` would be a perfectly reasonable extension.
> 

One thing to note is that `__array__` is actually asked to return a
copy AFAIK. I doubt it always does, but if it does not I assume the
object should and could provide `__array_interface__`.
Under that assumption, it would be an opt-out right now since NumPy
allows copies by default here.
Defining things along copy does seem sensible, though I do not know how
it would play with some of the current array-likes choosing to refuse
`__array__`.

- Sebastian



> Eric
> 
> On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias 
> wrote:
> 
> > Hi everyone,
> > 
> > One bit of expressivity we would miss is “copy if necessary, but
> > otherwise
> > > don’t bother”, but there are workarounds to this.
> > > 
> > 
> > After a side discussion with Stéfan van der Walt, we came up with
> > `allow_copy=True`, which would express to the downstream library
> > that we
> > don’t mind waiting, but that zero-copy would also be ok.
> > 
> > This sounds like the sort of thing that is use case driven. If
> > enough
> > projects want to use it, then I have no objections to adding the
> > keyword.
> > OTOH, we need to be careful about adding too many interoperability
> > tricks
> > as they complicate the code and makes it hard for folks to
> > determine the
> > best solution. Interoperability is a hot topic and we need to be
> > careful
> > not put too leave behind too many experiments in the NumPy
> > code.  Do you
> > have any other ideas of how to achieve the same effect?
> > 
> > 
> > Personally, I don’t have any other ideas, but would be happy to
> > hear some!
> > 
> > My view regarding API/experiment creep is that `__array__` is the
> > oldest
> > and most basic of all the interop tricks and that this can be
> > safely
> > maintained for future generations. Currently it only takes `dtype=`
> > as a
> > keyword argument, so it is a very lean API. I think this particular
> > use
> > case is very natural and I’ve encountered the reluctance to
> > implicitly copy
> > twice, so I expect it is reasonably common.
> > 
> > Regarding difficulty in determining the best solution, I would be
> > happy to
> > contribute to the dispatch basics guide together with the new
> > kwarg. I
> > agree that the protocols are getting quite numerous and I couldn’t
> > find a
> > single place that gathers all the best practices together. But, to
> > reiterate my point: `__array__` is the simplest of these and I
> > think this
> > keyword is pretty safe to add.
> > 
> > For ease of discussion, here are the API options discussed so far,
> > as well
> > as a few extra that I don’t like but might trigger other ideas:
> > 
> > np.asarray(my_duck_array, allow_copy=True)  # default is False, or
> > None ->
> > leave it to the duck array to decide
> > np.asarray(my_duck_array, copy=True)  # always copies, but, if
> > supported
> > by the duck array, defers to it for the copy
> > np.asarray(my_duck_array, copy=‘allow’)  # could take values
> > ‘allow’,
> > ‘force’, ’no’, True(=‘force’), False(=’no’)
> > np.asarray(my_duck_array, force_copy=False, allow_copy=True)  #
> > separate
> > concepts, but unclear what force_copy=True, allow_copy=False means!
> > np.asarray(my_duck_array, force=True)
> > 
> > Juan.
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-24 Thread Eric Wieser
Perhaps worth mentioning that we've discussed this sort of API before, in
https://github.com/numpy/numpy/pull/11897.

Under that proposal, the api would be something like:

* `copy=True` - always copy, like it is today
* `copy=False` - copy if needed, like it is today
* `copy=np.never_copy` - never copy, throw an exception if not possible

I think the discussion stalled on the precise spelling of the third option.

`__array__` was not discussed there, but it seems like adding the `copy`
argument to `__array__` would be a perfectly reasonable extension.

Eric

On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias  wrote:

> Hi everyone,
>
> One bit of expressivity we would miss is “copy if necessary, but otherwise
>> don’t bother”, but there are workarounds to this.
>>
>
> After a side discussion with Stéfan van der Walt, we came up with
> `allow_copy=True`, which would express to the downstream library that we
> don’t mind waiting, but that zero-copy would also be ok.
>
> This sounds like the sort of thing that is use case driven. If enough
> projects want to use it, then I have no objections to adding the keyword.
> OTOH, we need to be careful about adding too many interoperability tricks
> as they complicate the code and makes it hard for folks to determine the
> best solution. Interoperability is a hot topic and we need to be careful
> not put too leave behind too many experiments in the NumPy code.  Do you
> have any other ideas of how to achieve the same effect?
>
>
> Personally, I don’t have any other ideas, but would be happy to hear some!
>
> My view regarding API/experiment creep is that `__array__` is the oldest
> and most basic of all the interop tricks and that this can be safely
> maintained for future generations. Currently it only takes `dtype=` as a
> keyword argument, so it is a very lean API. I think this particular use
> case is very natural and I’ve encountered the reluctance to implicitly copy
> twice, so I expect it is reasonably common.
>
> Regarding difficulty in determining the best solution, I would be happy to
> contribute to the dispatch basics guide together with the new kwarg. I
> agree that the protocols are getting quite numerous and I couldn’t find a
> single place that gathers all the best practices together. But, to
> reiterate my point: `__array__` is the simplest of these and I think this
> keyword is pretty safe to add.
>
> For ease of discussion, here are the API options discussed so far, as well
> as a few extra that I don’t like but might trigger other ideas:
>
> np.asarray(my_duck_array, allow_copy=True)  # default is False, or None ->
> leave it to the duck array to decide
> np.asarray(my_duck_array, copy=True)  # always copies, but, if supported
> by the duck array, defers to it for the copy
> np.asarray(my_duck_array, copy=‘allow’)  # could take values ‘allow’,
> ‘force’, ’no’, True(=‘force’), False(=’no’)
> np.asarray(my_duck_array, force_copy=False, allow_copy=True)  # separate
> concepts, but unclear what force_copy=True, allow_copy=False means!
> np.asarray(my_duck_array, force=True)
>
> Juan.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion