Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface
On Sat, 2020-04-25 at 10:52 -0700, Stephan Hoyer wrote: > On Sat, Apr 25, 2020 at 10:40 AM Ralf Gommers > > wrote: > > > > > On Fri, Apr 24, 2020 at 12:35 PM Eric Wieser < > > wieser.eric+nu...@gmail.com> > > wrote: > > > > > Perhaps worth mentioning that we've discussed this sort of API > > > before, in > > > https://github.com/numpy/numpy/pull/11897. > > > > > > Under that proposal, the api would be something like: > > > > > > * `copy=True` - always copy, like it is today > > > * `copy=False` - copy if needed, like it is today > > > * `copy=np.never_copy` - never copy, throw an exception if not > > > possible > > > > > > > There's a couple of issues I see with using `copy` for __array__: > > - copy is already weird (False doesn't mean no), and a [bool, > > some_obj_or_str] keyword isn't making that better > > - the behavior we're talking about can do more than copying, e.g. > > for > > PyTorch it would modify the autograd graph by adding detach(), and > > for > > sparse it's not just "make a copy" (which implies doubling memory > > use) but > > it densifies which can massively blow up the memory. > > - I'm -1 on adding things to the main namespace (never_copy) for > > something > > that can be handled differently (like a string, or a new keyword) > > > > tl;dr a new `force` keyword would be better > > > > I agree, “copy” is not a good description of this desired coercion > behavior. > > A new keyword argument like “force” would be much clearer. > That seems fine and practical. But, in the end it seems to me that the `force=` keyword, just means that some projects want to teach their users that: 1. `np.asarray()` can be expensive (and may always copy) 2. `np.asarray()` always loses type properties while others do not choose to teach about it. There seems very little or even no "promise" attached to either `force=True` or `force=False`. In the end, the question is whether sparse will actually want to implement `force=True` if the main reason we add is for library use. There is no difference between a visualization library or numpy. In both cases the users memory will blow up just the same. As for PyTorch, is `.detach()` even a good reason? Maybe I am missing things, but: >>> torch.ones(10, requires_grad=True) + np.arange(10) # RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead. So arguably, there is no type-safety concern due to `.detach()`. There is an (obvious) general loss of type information that always occurs with an `np.asarray` call. But I do not see that creating any openings for bugs here, due to the wisdom of not allowing the above operation. In fact, it actually seems much worse for for xarray, or pandas. They do support the above operation and will potentially mess up if the arange was previously an xarray with a matching index, but different order. I am very much in favor of adding such things, but I still lack a bit of clarity as to whom we would be helping? If end-users will actually use `np.asarray(..., force=True)` over special methods, then great! But I am currently not sure the type- safety argument is all that big of a point. And the performance or memory-blowup argument remains true even for visualization libraries (where the array is purely input and never output as such). But yes, "never copy" is a somewhat different extension to `__array__` and `np.asarray`. It guarantees high speed and in-place behaviour which is useful for different settings. - Sebastian > > > Cheers, > > Ralf > > > > > > > I think the discussion stalled on the precise spelling of the > > > third > > > option. > > > > > > `__array__` was not discussed there, but it seems like adding the > > > `copy` > > > argument to `__array__` would be a perfectly reasonable > > > extension. > > > > > > Eric > > > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias < > > > j...@fastmail.com> > > > wrote: > > > > > > > Hi everyone, > > > > > > > > One bit of expressivity we would miss is “copy if necessary, > > > > but > > > > > otherwise don’t bother”, but there are workarounds to this. > > > > > > > > > > > > > After a side discussion with Stéfan van der Walt, we came up > > > > with > > > > `allow_copy=True`, which would express to the downstream > > > > library that we > > > > don’t mind waiting, but that zero-copy would also be ok. > > > > > > > > This sounds like the sort of thing that is use case driven. If > > > > enough > > > > projects want to use it, then I have no objections to adding > > > > the keyword. > > > > OTOH, we need to be careful about adding too many > > > > interoperability tricks > > > > as they complicate the code and makes it hard for folks to > > > > determine the > > > > best solution. Interoperability is a hot topic and we need to > > > > be careful > > > > not put too leave behind too many experiments in the NumPy > > > > code. Do you > > > > have any other ideas of how to achieve the same effect? > > >
Re: [Numpy-discussion] Feelings about type aliases in NumPy
To try and add some more data points to the conversation: > Maybe we can go for a bit more distant name like "numpy.annotations" or > whatever. Interestingly this was proposed independently here: https://github.com/numpy/numpy-stubs/pull/66#issuecomment-619131274 Related to that, Ralf was opposed to numpy.typing because it would shadow a stdlib module name: https://github.com/numpy/numpy-stubs/pull/66#issuecomment-619123629 But, types is _also_ a stdlib module name. Maybe the above points give some extra weight to "numpy.annotations"? > Unless we anticipate adding a long list of type aliases (more than the three > suggested so far) While working on some types in SciPy here: https://github.com/scipy/scipy/pull/11936#discussion_r415280894 we ran into the issue of typing things that are "integer types" or "floating types". For the time being we just inlined a definition like Union[float, np.floating], but conceivably we would want to unify those definitions somewhere instead of redefining them in every project. (Note that existing types like SupportsInt etc. were not what we wanted.) This perhaps suggests that the ultimate number of type aliases might be larger than we initially thought. On Sun, Apr 26, 2020 at 6:25 AM Ilhan Polat wrote: > > I agree that parking all these in a secondary namespace sounds a better > option, can't say that I feel for the word "typing" though. There are already > too many type, dtype, ctypeslib etc. Maybe we can go for a bit more distant > name like "numpy.annotations" or whatever. > > On Sat, Apr 25, 2020 at 8:51 AM Kevin Sheppard > wrote: >> >> Typing is for library developers more than end users. I would also worry >> that putting it into the top level might discourage other typing classes >> since it is more difficult to add to the top level than to a lower level >> module. np.typing seems very clear to me. >> >> On Sat, Apr 25, 2020, 07:41 Stephan Hoyer wrote: >>> >>> >>> >>> On Fri, Apr 24, 2020 at 11:31 AM Sebastian Berg >>> wrote: On Fri, 2020-04-24 at 11:10 -0700, Stefan van der Walt wrote: > On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote: > > But, Stephan pointed out that it might be confusing to users for > > objects to only exist at typing time, so we came around to the > > question of whether people are open to the idea of including the > > type > > aliases in NumPy itself. Ralf's concrete proposal was to make a > > module > > numpy.types (or maybe numpy.typing) to hold the aliases so that > > they > > don't pollute the top-level namespace. The module would initially > > contain the types > > That sounds very sensible. Having types available with NumPy should > also encourage their use, especially if we can add some documentation > around it. I agree, I might have a small tendency for `numpy.types` if we ever find any usage other than direct typing that may be the better name? >>> >>> >>> Unless we anticipate adding a long list of type aliases (more than the >>> three suggested so far), I would lean towards adding ArrayLike to the top >>> level NumPy namespace as np.ArrayLike. >>> >>> Type annotations are becoming an increasingly core part of modern Python >>> code. We should make it easy to appropriately type check functions that act >>> on NumPy arrays, and a top level np.ArrayLike is definitely more convenient >>> than np.types.ArrayLike. >>> Out of curiousity, I guess `ArrayLike` would be an ABC that a downstream project can register with? >>> >>> >>> ArrayLike will be a typing Protocol, automatically recognizing attributes >>> like __array__ to indicate that something can be cast to an array. >>> - Sebastian > > Stéfan > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> ___ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Feelings about type aliases in NumPy
I agree that parking all these in a secondary namespace sounds a better option, can't say that I feel for the word "typing" though. There are already too many type, dtype, ctypeslib etc. Maybe we can go for a bit more distant name like "numpy.annotations" or whatever. On Sat, Apr 25, 2020 at 8:51 AM Kevin Sheppard wrote: > Typing is for library developers more than end users. I would also worry > that putting it into the top level might discourage other typing classes > since it is more difficult to add to the top level than to a lower level > module. np.typing seems very clear to me. > > On Sat, Apr 25, 2020, 07:41 Stephan Hoyer wrote: > >> >> >> On Fri, Apr 24, 2020 at 11:31 AM Sebastian Berg < >> sebast...@sipsolutions.net> wrote: >> >>> On Fri, 2020-04-24 at 11:10 -0700, Stefan van der Walt wrote: >>> > On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote: >>> > > But, Stephan pointed out that it might be confusing to users for >>> > > objects to only exist at typing time, so we came around to the >>> > > question of whether people are open to the idea of including the >>> > > type >>> > > aliases in NumPy itself. Ralf's concrete proposal was to make a >>> > > module >>> > > numpy.types (or maybe numpy.typing) to hold the aliases so that >>> > > they >>> > > don't pollute the top-level namespace. The module would initially >>> > > contain the types >>> > >>> > That sounds very sensible. Having types available with NumPy should >>> > also encourage their use, especially if we can add some documentation >>> > around it. >>> >>> I agree, I might have a small tendency for `numpy.types` if we ever >>> find any usage other than direct typing that may be the better name? >> >> >> Unless we anticipate adding a long list of type aliases (more than the >> three suggested so far), I would lean towards adding ArrayLike to the top >> level NumPy namespace as np.ArrayLike. >> >> Type annotations are becoming an increasingly core part of modern Python >> code. We should make it easy to appropriately type check functions that act >> on NumPy arrays, and a top level np.ArrayLike is definitely more convenient >> than np.types.ArrayLike. >> >> Out of curiousity, I guess `ArrayLike` would be an ABC that a >>> downstream project can register with? >> >> >> ArrayLike will be a typing Protocol, automatically recognizing attributes >> like __array__ to indicate that something can be cast to an array. >> >> >>> >>> - Sebastian >>> >>> >>> > >>> > Stéfan >>> > ___ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion@python.org >>> > https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> >>> ___ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion