Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
Well, to get the ball rolling a bit, the key thing that matplotlib needs to know is if `shape`, `reshape`, 'size', broadcasting, and logical indexing is respected. So, I see three possible abc's here: one for attribute access (things like `shape` and `size`) and another for shape manipulations (broadcasting and reshape, and assignment to .shape). And then a third abc for indexing support, although, I am not sure how that could get implemented... Cheers! Ben Root On Mon, Nov 6, 2017 at 7:28 PM, Stephan Hoyer wrote: > On Mon, Nov 6, 2017 at 2:29 PM Ryan May wrote: > >> On Mon, Nov 6, 2017 at 3:18 PM, Chris Barker >> wrote: >> >>> Klunky, and maybe we could come up with a standard way to do it and >>> include that in numpy, but I'm not sure that ABCs are the way to do it. >>> >> >> ABCs are *absolutely* the way to go about it. It's the only way baked >> into the Python language itself that allows you to register a class for >> purposes of `isinstance` without needing to subclass--i.e. duck-typing. >> >> What's needed, though, is not just a single ABC. Some thought and design >> needs to go into segmenting the ndarray API to declare certain behaviors, >> just like was done for collections: >> >> https://docs.python.org/3/library/collections.abc.html >> >> You don't just have a single ABC declaring a collection, but rather "I am >> a mapping" or "I am a mutable sequence". It's more of a pain for developers >> to properly specify things, but this is not a bad thing to actually give >> code some thought. >> > > I agree, it would be nice to nail down a hierarchy of duck-arrays, if > possible. Although, there are quite a few options, so I don't know how > doable this is. Any interest in opening up an issue on GitHub to discuss? > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
Hi Benjamin, For the shapes and reshaping, I wrote an ShapedLikeNDArray mixin/ABC for astropy, which may be a useful starting point as it also provides a way to implement the methods ndarray uses to reshape and get elements: see https://github.com/astropy/astropy/blob/master/astropy/utils/misc.py#L863 All the best, Marten ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support
On Mon, Nov 6, 2017 at 6:14 PM, Charles R Harris wrote: > Also -- if py2.7 continues to see the use I expect it will well past when >>> pyton.org officially drops it, I wouldn't be surprised if a Python2.7 >>> Windows build based on a newer compiler would come along -- perhaps by >>> Anaconda or conda-forge, or ??? >>> >> >> I suspect that this will indeed happen. I am aware of multiple companies >> following this path already (building python + numpy themselves with a >> newer MS compiler). >> > > I think Anaconda is talking about distributing a compiler, but what that > will be on windows is anyone's guess. When we drop 2.7, there is a lot of > compatibility crud that it would be nice to get rid of, and if we do that > then NumPy will no longer compile against 2.7. I suspect some companies > have just been putting off the task of upgrading to Python 3, which should > be pretty straight forward these days apart from system code that needs to > do a lot of work with bytes. > I agree, and if there is a compelling reason to upgrade, folks WILL do it. But I've been amazed over the years at folks' desire to stick with what they have! And I'm guilty too, anything new I start with py3, but older larger codebases are still py2, I just can't find the energy to spend a the week or so it would probably take to update everything... But in the original post, the Windows Compiler issue was mentioned, so there seems to be two reasons to drop py2: A) wanting to use py3 only features. B) wanting to use never C (C++?) compiler features. I suggest we be clear about which of these is driving the decisions, and explicit about the goals. That is, if (A) is critical, we don't even have to talk about (B) But we could choose to do (B) without doing (A) -- I suspect there will be a user base for that -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
On Mon, Nov 6, 2017 at 4:28 PM, Stephan Hoyer wrote: > >> What's needed, though, is not just a single ABC. Some thought and design >> needs to go into segmenting the ndarray API to declare certain behaviors, >> just like was done for collections: >> >> https://docs.python.org/3/library/collections.abc.html >> >> You don't just have a single ABC declaring a collection, but rather "I am >> a mapping" or "I am a mutable sequence". It's more of a pain for developers >> to properly specify things, but this is not a bad thing to actually give >> code some thought. >> > > I agree, it would be nice to nail down a hierarchy of duck-arrays, if > possible. Although, there are quite a few options, so I don't know how > doable this is. > Exactly -- there are an exponential amount of options... > Well, to get the ball rolling a bit, the key thing that matplotlib needs > to know is if `shape`, `reshape`, 'size', broadcasting, and logical > indexing is respected. So, I see three possible abc's here: one for > attribute access (things like `shape` and `size`) and another for shape > manipulations (broadcasting and reshape, and assignment to .shape). I think we're going to get into an string of ABCs: ArrayLikeForMPL_ABC etc, etc. > And then a third abc for indexing support, although, I am not sure how > that could get implemented... This is the really tricky one -- all ABCs really check is the existence of methods -- making sure they behave the same way is up to the developer of the ducktype. which is K, but will require discipline. But indexing, specifically fancy indexing, is another matter -- I'm not sure if there even a way with an ABC to check for what types of indexing are support, but we'd still have the problem with whether the semantics are the same! For example, I work with netcdf variable objects, which are partly duck-typed as ndarrays, but I think n-dimensional fancy indexing works differently... how in the world do you detect that with an ABC??? For the shapes and reshaping, I wrote an ShapedLikeNDArray mixin/ABC > for astropy, which may be a useful starting point as it also provides > a way to implement the methods ndarray uses to reshape and get > elements: see > https://github.com/astropy/astropy/blob/master/astropy/utils/misc.py#L863 Sounds like a good starting point for discussion. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
On Tue, Nov 7, 2017 at 1:20 PM, Chris Barker wrote: > On Mon, Nov 6, 2017 at 4:28 PM, Stephan Hoyer wrote: > >> >>> What's needed, though, is not just a single ABC. Some thought and design >>> needs to go into segmenting the ndarray API to declare certain behaviors, >>> just like was done for collections: >>> >>> https://docs.python.org/3/library/collections.abc.html >>> >>> You don't just have a single ABC declaring a collection, but rather "I >>> am a mapping" or "I am a mutable sequence". It's more of a pain for >>> developers to properly specify things, but this is not a bad thing to >>> actually give code some thought. >>> >> >> I agree, it would be nice to nail down a hierarchy of duck-arrays, if >> possible. Although, there are quite a few options, so I don't know how >> doable this is. >> > > Exactly -- there are an exponential amount of options... > > >> Well, to get the ball rolling a bit, the key thing that matplotlib needs >> to know is if `shape`, `reshape`, 'size', broadcasting, and logical >> indexing is respected. So, I see three possible abc's here: one for >> attribute access (things like `shape` and `size`) and another for shape >> manipulations (broadcasting and reshape, and assignment to .shape). > > > I think we're going to get into an string of ABCs: > > ArrayLikeForMPL_ABC > > etc, etc. > Only if you try to provide perfectly-sized options for every occasion--but that's not how we do things in (sane) software development. You provide a few options that optimize the common use cases, and you don't try to cover everything--let client code figure out the right combination from the primitives you provide. One can always just inherit/register *all* the ABCs if need be. The status quo is that we have 1 interface that covers everything from multiple dims and shape to math and broadcasting to the entire __array__ interface. Even breaking that up into the 3 "obvious" chunks would be a massive improvement. I just don't want to see this effort bog down into "this is so hard". Getting it perfect is hard; getting it useful is much easier. It's important to note that we can always break up/combine existing ABCs into other ones later. > And then a third abc for indexing support, although, I am not sure how >> that could get implemented... > > > This is the really tricky one -- all ABCs really check is the existence of > methods -- making sure they behave the same way is up to the developer of > the ducktype. > > which is K, but will require discipline. > > But indexing, specifically fancy indexing, is another matter -- I'm not > sure if there even a way with an ABC to check for what types of indexing > are support, but we'd still have the problem with whether the semantics are > the same! > > For example, I work with netcdf variable objects, which are partly > duck-typed as ndarrays, but I think n-dimensional fancy indexing works > differently... how in the world do you detect that with an ABC??? > Even documenting expected behavior as part of these ABCs would go a long way towards helping standardize behavior. Another idea would be to put together a conformance test suite as part of this effort, in lieu of some kind of run-time checking of behavior (which would be terrible). That would help developers of other "ducks" check that they're doing the right things. I'd imagine the existing NumPy test suite would largely cover this. Ryan -- Ryan May ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
On Tue, Nov 7, 2017 at 12:23 PM Chris Barker wrote: > > And then a third abc for indexing support, although, I am not sure how >> that could get implemented... > > > This is the really tricky one -- all ABCs really check is the existence of > methods -- making sure they behave the same way is up to the developer of > the ducktype. > > which is K, but will require discipline. > > But indexing, specifically fancy indexing, is another matter -- I'm not > sure if there even a way with an ABC to check for what types of indexing > are support, but we'd still have the problem with whether the semantics are > the same! > > For example, I work with netcdf variable objects, which are partly > duck-typed as ndarrays, but I think n-dimensional fancy indexing works > differently... how in the world do you detect that with an ABC??? > We recently worked out a hierarchy of indexing types for xarray. To a crude approximation, we have: - "Basic" indexing support for slices and integers. Nearly every array type satisfies this. - "Outer" or "orthogonal" indexing with slices, integers and 1D arrays. This is what netCDF4-Python and Fortran/MATLAB support. - "Vectorized" indexing with broadcasting and multi-dimensional indexers. NumPy supports a generalization of this, but I would not wish the edge cases involving mixed slices/arrays upon anyone. - "Logical" indexing by a boolean array with the same shape. - "Exactly like NumPy" for subclasses or wrappers around NumPy arrays. There's some ambiguities in this, but that's what specs are for. For most applications, we probably don't need most of these: ABCs for "Basic", "Logical" and "Exactly like NumPy" would go a long ways. ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
On Nov 6, 2017 4:19 PM, "Chris Barker" wrote: On Sat, Nov 4, 2017 at 6:47 AM, Marten van Kerkwijk < m.h.vankerkw...@gmail.com> wrote: > > You just summarized excellently why I'm on a quest to change `asarray` > to `asanyarray` within numpy +1 -- we should all be using asanyarray() most of the time. The problem is that if you use 'asanyarray', then you're claiming that your code works correctly for: - regular ndarrays - np.matrix - np.ma masked arrays - and every third party subclass, regardless of their semantics, regardless of whether you've heard of them or not If subclasses followed the Liskov substitution principle, and had different internal implementations but the same public ("duck") API, then this would be fine. But in practice, numpy limitations mean that ndarrays subclasses have to have the same internal implementation, so the only reason to make an ndarray subclass is if you want to make something with a different public API. Basically the whole system is designed for subclasses to be incompatible. The end result is that if you use asanyarray, your code is definitely wrong, because there's no way you're actually doing the right thing for arbitrary ndarray subclasses. But if you don't use asanyarray, then yeah, that's also wrong, because it won't work on mostly-compatible subclasses like astropy's. Given this, different projects reasonably make different choices -- it's not just legacy code that uses asarray. In the long run we obviously need to come up with new options that don't have these tradeoffs (that's why we want to let units to to dtypes, implement methods like __array_ufunc__ to enable duck arrays, etc.) let's try to be sympathetic to other projects that are doing their best :-). -n ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support
On Nov 7, 2017 2:15 PM, "Chris Barker" wrote: On Mon, Nov 6, 2017 at 6:14 PM, Charles R Harris wrote: > Also -- if py2.7 continues to see the use I expect it will well past when >>> pyton.org officially drops it, I wouldn't be surprised if a Python2.7 >>> Windows build based on a newer compiler would come along -- perhaps by >>> Anaconda or conda-forge, or ??? >>> >> >> I suspect that this will indeed happen. I am aware of multiple companies >> following this path already (building python + numpy themselves with a >> newer MS compiler). >> > > I think Anaconda is talking about distributing a compiler, but what that > will be on windows is anyone's guess. When we drop 2.7, there is a lot of > compatibility crud that it would be nice to get rid of, and if we do that > then NumPy will no longer compile against 2.7. I suspect some companies > have just been putting off the task of upgrading to Python 3, which should > be pretty straight forward these days apart from system code that needs to > do a lot of work with bytes. > I agree, and if there is a compelling reason to upgrade, folks WILL do it. But I've been amazed over the years at folks' desire to stick with what they have! And I'm guilty too, anything new I start with py3, but older larger codebases are still py2, I just can't find the energy to spend a the week or so it would probably take to update everything... But in the original post, the Windows Compiler issue was mentioned, so there seems to be two reasons to drop py2: A) wanting to use py3 only features. B) wanting to use never C (C++?) compiler features. I suggest we be clear about which of these is driving the decisions, and explicit about the goals. That is, if (A) is critical, we don't even have to talk about (B) But we could choose to do (B) without doing (A) -- I suspect there will be a user base for that The problem is it's hard to predict the future. Right now neither PyPI nor conda provide any way to distribute binaries for py27-but-with-a-newer-ABI, and maybe they never will; or maybe they will eventually, but not enough people use them to justify keeping py2 support given the other overheads; or... who knows, really. Right now, the decision in front of us is what to tell people who ask about numpy's py2 support plans, so that they can make their own plans. Given what we know right now, I don't think we should promise to keep support past 2018. If we get there and the situation's changed, and there's both desire and means to extend support we can revisit that. But's better to under-promise and possibly over-deliver, instead of promising to support py2 until after it becomes a millstone around our necks and then realizing we haven't warned anyone and are stuck supporting it another year beyond that... -n ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?
Hi Nathaniel, You're right, I shouldn't be righteous. Though I do think the advantage of `asanyarray` inside numpy is remains that it is easy for a user to add `asarray` to their input to a numpy function, and not easy for a happily compatible subclass to avoid an `asarray` inside a numpy function! I.e., coerce as little as you can get away with... All the best, Marten ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion