[Numpy-discussion] beginner introduction to group
Hi Everyone, I am new to contributing to numpy. I have read the contributors guide and done with the set-up. Hope to make some good contributions and also to connect with all you great people in the numpy community. Any suggestions and tips are always welcome. Thanks and Regards ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Feelings about type aliases in NumPy
On Fri, 2020-04-24 at 11:10 -0700, Stefan van der Walt wrote: > On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote: > > But, Stephan pointed out that it might be confusing to users for > > objects to only exist at typing time, so we came around to the > > question of whether people are open to the idea of including the > > type > > aliases in NumPy itself. Ralf's concrete proposal was to make a > > module > > numpy.types (or maybe numpy.typing) to hold the aliases so that > > they > > don't pollute the top-level namespace. The module would initially > > contain the types > > That sounds very sensible. Having types available with NumPy should > also encourage their use, especially if we can add some documentation > around it. I agree, I might have a small tendency for `numpy.types` if we ever find any usage other than direct typing that may be the better name? Out of curiousity, I guess `ArrayLike` would be an ABC that a downstream project can register with? - Sebastian > > Stéfan > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface
On Fri, 2020-04-24 at 10:12 -0700, Stephan Hoyer wrote: > On Fri, Apr 24, 2020 at 6:31 AM Sebastian Berg < > sebast...@sipsolutions.net> > wrote: > > > One thing to note is that `__array__` is actually asked to return a > > copy AFAIK. > > The documentation on __array__ seems to quite limited, unfortunately. > The > most I can find are a few sentences here: > https://numpy.org/doc/stable/reference/arrays.classes.html#numpy.class.__array__ > > I don't see anything about returning copies. My interpretation has > always > been that __array__ can return either a copy or a view, like the > np.asarray() constructor. > Hmmm, right, I am not quite sure why I thought this was the case. The more important part is behaviour. And the fact is that if you do `np.array(array_like)` with an array like that implements `__array__` then we ensure a copy is made by default (`copy=True` by default), even though `__array__()` may already return a copy. In any case, the current default for `np.asarray`, i.e. `copy=False` is "copy if necessary". So if PyTorch uses a new parameter to Opt-In to copying, the default behaviour will depend on the object. The definition would then be: Copy if necessary but error if a copy is necessary and the object doesn't want to be copied silently. To be honest, that seems not totally terrible to me... The old statement remains true with the small caveat that it will sometimes cause a loud error explaining things. The only problem is that some users may want an the explicit `np.copy_if_necessary` to get PyTorch to do what most already do on `copy=False`. I guess the new behaviour would then be: if copy is np.never_copy: # or however we signal it try: arr = obj.__array__(copy=np.no_copy) except TypeError as e: raise TypeError("no copy appears unsupported by ...!") from e elif copy is np.copy_if_necessary: # Some users may want to tell PyTorch not to error, but # tell pandas, that a view is OK: try: arr = np.array(copy=np.copy_if_necessary) except TypeError: arr = obj.__array__() elif not copy: # Behaviour here may depend on the array-like! # current array likes may or may not return a copy, # new ones may choose to raise an error when a view # is not possible. arr = obj.__array__() else: try: arr = obj.__array__(copy=True) except TypeError: arr = obj.__array__() arr = arr.copy() # make sure its a copy PyTorch can then implement copy, but raise an error if `copy=False` (which must be the default). Current objects will error for `np.never_copy` but otherwise be fine. And they can implement `copy` to avoid an unnecessary double copy if they wish so. We could add the `np.copy_if_necessary` to be an explicit replacement for the current `copy=False`. This will be necessary, or nicer, unless everyone is happy to copy by default. Another side note: calls such as `np.array([arr1, arr2])` probably must always fail if `copy=np.never_copy` since a view is not guaranteed. - Sebastian > > > I doubt it always does, but if it does not I assume the > > object should and could provide `__array_interface__`. > > > > Objects like xarray.DataArray and pandas.Series sometimes directly > wrap > NumPy arrays and sometimes don't. > > They both implement __array__ but not __array_inferace__. It's very > obvious > how to implement a "forwarding" __array__ method (just call > `np.asarray()` > on an argument that might implement it). I guess something similar > could be > done for __array_interface__, but it's not clear to me that it's > right to > implement __array_interface__ when doing so might require a copy. > Yes, I do not think you should implement __array_interface__ then, unless "simplifying the array" is for some reason beneficial for yourself. I suppose you could raise an AttributeError, but it is questionable if thats good. > > > Under that assumption, it would be an opt-out right now since NumPy > > allows copies by default here. > > Defining things along copy does seem sensible, though I do not know > > how > > it would play with some of the current array-likes choosing to > > refuse > > `__array__`. > > > > - Sebastian > > > > > > > > > Eric > > > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias < > > > j...@fastmail.com> > > > wrote: > > > > > > > Hi everyone, > > > > > > > > One bit of expressivity we would miss is “copy if necessary, > > > > but > > > > otherwise > > > > > don’t bother”, but there are workarounds to this. > > > > > > > > > > > > > After a side discussion with Stéfan van der Walt, we came up > > > > with > > > > `allow_copy=True`, which would express to the downstream > > > > library > > > > that we > > > > don’t mind waiting, but that zero-copy would also be ok. > > > > > > > > This sounds like the sort of thing that is use case driven. If > > > > enough > > > > projects want to use it, then I have no objections to adding > > > > the >
Re: [Numpy-discussion] Feelings about type aliases in NumPy
On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote: > But, Stephan pointed out that it might be confusing to users for > objects to only exist at typing time, so we came around to the > question of whether people are open to the idea of including the type > aliases in NumPy itself. Ralf's concrete proposal was to make a module > numpy.types (or maybe numpy.typing) to hold the aliases so that they > don't pollute the top-level namespace. The module would initially > contain the types That sounds very sensible. Having types available with NumPy should also encourage their use, especially if we can add some documentation around it. Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface
On Fri, Apr 24, 2020 at 6:31 AM Sebastian Berg wrote: > One thing to note is that `__array__` is actually asked to return a > copy AFAIK. The documentation on __array__ seems to quite limited, unfortunately. The most I can find are a few sentences here: https://numpy.org/doc/stable/reference/arrays.classes.html#numpy.class.__array__ I don't see anything about returning copies. My interpretation has always been that __array__ can return either a copy or a view, like the np.asarray() constructor. > I doubt it always does, but if it does not I assume the > object should and could provide `__array_interface__`. > Objects like xarray.DataArray and pandas.Series sometimes directly wrap NumPy arrays and sometimes don't. They both implement __array__ but not __array_inferace__. It's very obvious how to implement a "forwarding" __array__ method (just call `np.asarray()` on an argument that might implement it). I guess something similar could be done for __array_interface__, but it's not clear to me that it's right to implement __array_interface__ when doing so might require a copy. > Under that assumption, it would be an opt-out right now since NumPy > allows copies by default here. > Defining things along copy does seem sensible, though I do not know how > it would play with some of the current array-likes choosing to refuse > `__array__`. > > - Sebastian > > > > > Eric > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias > > wrote: > > > > > Hi everyone, > > > > > > One bit of expressivity we would miss is “copy if necessary, but > > > otherwise > > > > don’t bother”, but there are workarounds to this. > > > > > > > > > > After a side discussion with Stéfan van der Walt, we came up with > > > `allow_copy=True`, which would express to the downstream library > > > that we > > > don’t mind waiting, but that zero-copy would also be ok. > > > > > > This sounds like the sort of thing that is use case driven. If > > > enough > > > projects want to use it, then I have no objections to adding the > > > keyword. > > > OTOH, we need to be careful about adding too many interoperability > > > tricks > > > as they complicate the code and makes it hard for folks to > > > determine the > > > best solution. Interoperability is a hot topic and we need to be > > > careful > > > not put too leave behind too many experiments in the NumPy > > > code. Do you > > > have any other ideas of how to achieve the same effect? > > > > > > > > > Personally, I don’t have any other ideas, but would be happy to > > > hear some! > > > > > > My view regarding API/experiment creep is that `__array__` is the > > > oldest > > > and most basic of all the interop tricks and that this can be > > > safely > > > maintained for future generations. Currently it only takes `dtype=` > > > as a > > > keyword argument, so it is a very lean API. I think this particular > > > use > > > case is very natural and I’ve encountered the reluctance to > > > implicitly copy > > > twice, so I expect it is reasonably common. > > > > > > Regarding difficulty in determining the best solution, I would be > > > happy to > > > contribute to the dispatch basics guide together with the new > > > kwarg. I > > > agree that the protocols are getting quite numerous and I couldn’t > > > find a > > > single place that gathers all the best practices together. But, to > > > reiterate my point: `__array__` is the simplest of these and I > > > think this > > > keyword is pretty safe to add. > > > > > > For ease of discussion, here are the API options discussed so far, > > > as well > > > as a few extra that I don’t like but might trigger other ideas: > > > > > > np.asarray(my_duck_array, allow_copy=True) # default is False, or > > > None -> > > > leave it to the duck array to decide > > > np.asarray(my_duck_array, copy=True) # always copies, but, if > > > supported > > > by the duck array, defers to it for the copy > > > np.asarray(my_duck_array, copy=‘allow’) # could take values > > > ‘allow’, > > > ‘force’, ’no’, True(=‘force’), False(=’no’) > > > np.asarray(my_duck_array, force_copy=False, allow_copy=True) # > > > separate > > > concepts, but unclear what force_copy=True, allow_copy=False means! > > > np.asarray(my_duck_array, force=True) > > > > > > Juan. > > > ___ > > > NumPy-Discussion mailing list > > > NumPy-Discussion@python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > ___ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Feelings about type aliases in NumPy
Hey everyone, Over in numpy-stubs we've been working on typing "array like": https://github.com/numpy/numpy-stubs/pull/66 It would be nice if the type were public so that downstream projects could use it (e.g. it would be very helpful in SciPy). Originally the plan was to only make it publicly available at typing time and not runtime, which would mean that no changes to NumPy are necessary; see https://github.com/numpy/numpy-stubs/pull/66#issuecomment-618784833 for more information on how that works. But, Stephan pointed out that it might be confusing to users for objects to only exist at typing time, so we came around to the question of whether people are open to the idea of including the type aliases in NumPy itself. Ralf's concrete proposal was to make a module numpy.types (or maybe numpy.typing) to hold the aliases so that they don't pollute the top-level namespace. The module would initially contain the types - ArrayLike - DtypeLike - (maybe) ShapeLike Note that we would not need to make changes to NumPy right away; instead it would probably be done when numpy-stubs is merged into NumPy itself. What do people think? - Josh ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface
On Fri, 2020-04-24 at 11:34 +0100, Eric Wieser wrote: > Perhaps worth mentioning that we've discussed this sort of API > before, in > https://github.com/numpy/numpy/pull/11897. > > Under that proposal, the api would be something like: > > * `copy=True` - always copy, like it is today > * `copy=False` - copy if needed, like it is today > * `copy=np.never_copy` - never copy, throw an exception if not > possible > > I think the discussion stalled on the precise spelling of the third > option. > > `__array__` was not discussed there, but it seems like adding the > `copy` > argument to `__array__` would be a perfectly reasonable extension. > One thing to note is that `__array__` is actually asked to return a copy AFAIK. I doubt it always does, but if it does not I assume the object should and could provide `__array_interface__`. Under that assumption, it would be an opt-out right now since NumPy allows copies by default here. Defining things along copy does seem sensible, though I do not know how it would play with some of the current array-likes choosing to refuse `__array__`. - Sebastian > Eric > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias > wrote: > > > Hi everyone, > > > > One bit of expressivity we would miss is “copy if necessary, but > > otherwise > > > don’t bother”, but there are workarounds to this. > > > > > > > After a side discussion with Stéfan van der Walt, we came up with > > `allow_copy=True`, which would express to the downstream library > > that we > > don’t mind waiting, but that zero-copy would also be ok. > > > > This sounds like the sort of thing that is use case driven. If > > enough > > projects want to use it, then I have no objections to adding the > > keyword. > > OTOH, we need to be careful about adding too many interoperability > > tricks > > as they complicate the code and makes it hard for folks to > > determine the > > best solution. Interoperability is a hot topic and we need to be > > careful > > not put too leave behind too many experiments in the NumPy > > code. Do you > > have any other ideas of how to achieve the same effect? > > > > > > Personally, I don’t have any other ideas, but would be happy to > > hear some! > > > > My view regarding API/experiment creep is that `__array__` is the > > oldest > > and most basic of all the interop tricks and that this can be > > safely > > maintained for future generations. Currently it only takes `dtype=` > > as a > > keyword argument, so it is a very lean API. I think this particular > > use > > case is very natural and I’ve encountered the reluctance to > > implicitly copy > > twice, so I expect it is reasonably common. > > > > Regarding difficulty in determining the best solution, I would be > > happy to > > contribute to the dispatch basics guide together with the new > > kwarg. I > > agree that the protocols are getting quite numerous and I couldn’t > > find a > > single place that gathers all the best practices together. But, to > > reiterate my point: `__array__` is the simplest of these and I > > think this > > keyword is pretty safe to add. > > > > For ease of discussion, here are the API options discussed so far, > > as well > > as a few extra that I don’t like but might trigger other ideas: > > > > np.asarray(my_duck_array, allow_copy=True) # default is False, or > > None -> > > leave it to the duck array to decide > > np.asarray(my_duck_array, copy=True) # always copies, but, if > > supported > > by the duck array, defers to it for the copy > > np.asarray(my_duck_array, copy=‘allow’) # could take values > > ‘allow’, > > ‘force’, ’no’, True(=‘force’), False(=’no’) > > np.asarray(my_duck_array, force_copy=False, allow_copy=True) # > > separate > > concepts, but unclear what force_copy=True, allow_copy=False means! > > np.asarray(my_duck_array, force=True) > > > > Juan. > > ___ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface
Perhaps worth mentioning that we've discussed this sort of API before, in https://github.com/numpy/numpy/pull/11897. Under that proposal, the api would be something like: * `copy=True` - always copy, like it is today * `copy=False` - copy if needed, like it is today * `copy=np.never_copy` - never copy, throw an exception if not possible I think the discussion stalled on the precise spelling of the third option. `__array__` was not discussed there, but it seems like adding the `copy` argument to `__array__` would be a perfectly reasonable extension. Eric On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias wrote: > Hi everyone, > > One bit of expressivity we would miss is “copy if necessary, but otherwise >> don’t bother”, but there are workarounds to this. >> > > After a side discussion with Stéfan van der Walt, we came up with > `allow_copy=True`, which would express to the downstream library that we > don’t mind waiting, but that zero-copy would also be ok. > > This sounds like the sort of thing that is use case driven. If enough > projects want to use it, then I have no objections to adding the keyword. > OTOH, we need to be careful about adding too many interoperability tricks > as they complicate the code and makes it hard for folks to determine the > best solution. Interoperability is a hot topic and we need to be careful > not put too leave behind too many experiments in the NumPy code. Do you > have any other ideas of how to achieve the same effect? > > > Personally, I don’t have any other ideas, but would be happy to hear some! > > My view regarding API/experiment creep is that `__array__` is the oldest > and most basic of all the interop tricks and that this can be safely > maintained for future generations. Currently it only takes `dtype=` as a > keyword argument, so it is a very lean API. I think this particular use > case is very natural and I’ve encountered the reluctance to implicitly copy > twice, so I expect it is reasonably common. > > Regarding difficulty in determining the best solution, I would be happy to > contribute to the dispatch basics guide together with the new kwarg. I > agree that the protocols are getting quite numerous and I couldn’t find a > single place that gathers all the best practices together. But, to > reiterate my point: `__array__` is the simplest of these and I think this > keyword is pretty safe to add. > > For ease of discussion, here are the API options discussed so far, as well > as a few extra that I don’t like but might trigger other ideas: > > np.asarray(my_duck_array, allow_copy=True) # default is False, or None -> > leave it to the duck array to decide > np.asarray(my_duck_array, copy=True) # always copies, but, if supported > by the duck array, defers to it for the copy > np.asarray(my_duck_array, copy=‘allow’) # could take values ‘allow’, > ‘force’, ’no’, True(=‘force’), False(=’no’) > np.asarray(my_duck_array, force_copy=False, allow_copy=True) # separate > concepts, but unclear what force_copy=True, allow_copy=False means! > np.asarray(my_duck_array, force=True) > > Juan. > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion