Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-07 Thread Benjamin Root
Well, to get the ball rolling a bit, the key thing that matplotlib needs to
know is if `shape`, `reshape`, 'size', broadcasting, and logical indexing
is respected. So, I see three possible abc's here: one for attribute access
(things like `shape` and `size`) and another for shape manipulations
(broadcasting and reshape, and assignment to .shape). And then a third abc
for indexing support, although, I am not sure how that could get
implemented...

Cheers!
Ben Root


On Mon, Nov 6, 2017 at 7:28 PM, Stephan Hoyer  wrote:

> On Mon, Nov 6, 2017 at 2:29 PM Ryan May  wrote:
>
>> On Mon, Nov 6, 2017 at 3:18 PM, Chris Barker 
>> wrote:
>>
>>> Klunky, and maybe we could come up with a standard way to do it and
>>> include that in numpy, but I'm not sure that ABCs are the way to do it.
>>>
>>
>> ABCs are *absolutely* the way to go about it. It's the only way baked
>> into the Python language itself that allows you to register a class for
>> purposes of `isinstance` without needing to subclass--i.e. duck-typing.
>>
>> What's needed, though, is not just a single ABC. Some thought and design
>> needs to go into segmenting the ndarray API to declare certain behaviors,
>> just like was done for collections:
>>
>> https://docs.python.org/3/library/collections.abc.html
>>
>> You don't just have a single ABC declaring a collection, but rather "I am
>> a mapping" or "I am a mutable sequence". It's more of a pain for developers
>> to properly specify things, but this is not a bad thing to actually give
>> code some thought.
>>
>
> I agree, it would be nice to nail down a hierarchy of duck-arrays, if
> possible. Although, there are quite a few options, so I don't know how
> doable this is. Any interest in opening up an issue on GitHub to discuss?
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-07 Thread Marten van Kerkwijk
Hi Benjamin,

For the shapes and reshaping, I wrote an ShapedLikeNDArray mixin/ABC
for astropy, which may be a useful starting point as it also provides
a way to implement the methods ndarray uses to reshape and get
elements: see 
https://github.com/astropy/astropy/blob/master/astropy/utils/misc.py#L863

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support

2017-11-07 Thread Chris Barker
On Mon, Nov 6, 2017 at 6:14 PM, Charles R Harris 
wrote:

> Also -- if py2.7 continues to see the use I expect it will well past when
>>> pyton.org officially drops it, I wouldn't be surprised if a Python2.7
>>> Windows build based on a newer compiler would come along -- perhaps by
>>> Anaconda or conda-forge, or ???
>>>
>>
>> I suspect that this will indeed happen. I am aware of multiple companies
>> following this path already (building python + numpy themselves with a
>> newer MS compiler).
>>
>
> I think Anaconda is talking about distributing a compiler, but what that
> will be on windows is anyone's guess. When we drop 2.7, there is a lot of
> compatibility crud that it would be nice to get rid of, and if we do that
> then NumPy will no longer compile against 2.7. I suspect some companies
> have just been putting off the task of upgrading to Python 3, which should
> be pretty straight forward these days apart from system code that needs to
> do a lot of work with bytes.
>

I agree, and if there is a compelling reason to upgrade, folks WILL do it.
But I've been amazed over the years at folks' desire to stick with what
they have! And I'm guilty too, anything new I start with py3, but older
larger codebases are still py2, I just can't find the energy to spend a the
week or so it would probably take to update everything...

But in the original post, the Windows Compiler issue was mentioned, so
there seems to be two reasons to drop py2:

A) wanting to use py3 only features.
B) wanting to use never C (C++?) compiler features.

I suggest we be clear about which of these is driving the decisions, and
explicit about the goals. That is, if (A) is critical, we don't even have
to talk about (B)

But we could choose to do (B) without doing (A) -- I suspect there will be
a user base for that

-CHB




-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-07 Thread Chris Barker
On Mon, Nov 6, 2017 at 4:28 PM, Stephan Hoyer  wrote:

>
>> What's needed, though, is not just a single ABC. Some thought and design
>> needs to go into segmenting the ndarray API to declare certain behaviors,
>> just like was done for collections:
>>
>> https://docs.python.org/3/library/collections.abc.html
>>
>> You don't just have a single ABC declaring a collection, but rather "I am
>> a mapping" or "I am a mutable sequence". It's more of a pain for developers
>> to properly specify things, but this is not a bad thing to actually give
>> code some thought.
>>
>
> I agree, it would be nice to nail down a hierarchy of duck-arrays, if
> possible. Although, there are quite a few options, so I don't know how
> doable this is.
>

Exactly -- there are an exponential amount of options...


> Well, to get the ball rolling a bit, the key thing that matplotlib needs
> to know is if `shape`, `reshape`, 'size', broadcasting, and logical
> indexing is respected. So, I see three possible abc's here: one for
> attribute access (things like `shape` and `size`) and another for shape
> manipulations (broadcasting and reshape, and assignment to .shape).


I think we're going to get into an string of ABCs:

ArrayLikeForMPL_ABC

etc, etc.


> And then a third abc for indexing support, although, I am not sure how
> that could get implemented...


This is the really tricky one -- all ABCs really check is the existence of
methods -- making sure they behave the same way is up to the developer of
the ducktype.

which is K, but will require discipline.

But indexing, specifically fancy indexing, is another matter -- I'm not
sure if there even a way with an ABC to check for what types of indexing
are support, but we'd still have the problem with whether the semantics are
the same!

For example, I work with netcdf variable objects, which are partly
duck-typed as ndarrays, but I think n-dimensional fancy indexing works
differently... how in the world do you detect that with an ABC???

For the shapes and reshaping, I wrote an ShapedLikeNDArray mixin/ABC
> for astropy, which may be a useful starting point as it also provides
> a way to implement the methods ndarray uses to reshape and get
> elements: see
> https://github.com/astropy/astropy/blob/master/astropy/utils/misc.py#L863


Sounds like a good starting point for discussion.

-CHB



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-07 Thread Ryan May
On Tue, Nov 7, 2017 at 1:20 PM, Chris Barker  wrote:

> On Mon, Nov 6, 2017 at 4:28 PM, Stephan Hoyer  wrote:
>
>>
>>> What's needed, though, is not just a single ABC. Some thought and design
>>> needs to go into segmenting the ndarray API to declare certain behaviors,
>>> just like was done for collections:
>>>
>>> https://docs.python.org/3/library/collections.abc.html
>>>
>>> You don't just have a single ABC declaring a collection, but rather "I
>>> am a mapping" or "I am a mutable sequence". It's more of a pain for
>>> developers to properly specify things, but this is not a bad thing to
>>> actually give code some thought.
>>>
>>
>> I agree, it would be nice to nail down a hierarchy of duck-arrays, if
>> possible. Although, there are quite a few options, so I don't know how
>> doable this is.
>>
>
> Exactly -- there are an exponential amount of options...
>
>
>> Well, to get the ball rolling a bit, the key thing that matplotlib needs
>> to know is if `shape`, `reshape`, 'size', broadcasting, and logical
>> indexing is respected. So, I see three possible abc's here: one for
>> attribute access (things like `shape` and `size`) and another for shape
>> manipulations (broadcasting and reshape, and assignment to .shape).
>
>
> I think we're going to get into an string of ABCs:
>
> ArrayLikeForMPL_ABC
>
> etc, etc.
>

Only if you try to provide perfectly-sized options for every occasion--but
that's not how we do things in (sane) software development. You provide a
few options that optimize the common use cases, and you don't try to cover
everything--let client code figure out the right combination from the
primitives you provide. One can always just inherit/register *all* the ABCs
if need be. The status quo is that we have 1 interface that covers
everything from multiple dims and shape to math and broadcasting to the
entire __array__ interface. Even breaking that up into the 3 "obvious"
chunks would be a massive improvement.

I just don't want to see this effort bog down into "this is so hard".
Getting it perfect is hard; getting it useful is much easier.

It's important to note that we can always break up/combine existing ABCs
into other ones later.


> And then a third abc for indexing support, although, I am not sure how
>> that could get implemented...
>
>
> This is the really tricky one -- all ABCs really check is the existence of
> methods -- making sure they behave the same way is up to the developer of
> the ducktype.
>
> which is K, but will require discipline.
>
> But indexing, specifically fancy indexing, is another matter -- I'm not
> sure if there even a way with an ABC to check for what types of indexing
> are support, but we'd still have the problem with whether the semantics are
> the same!
>
> For example, I work with netcdf variable objects, which are partly
> duck-typed as ndarrays, but I think n-dimensional fancy indexing works
> differently... how in the world do you detect that with an ABC???
>

Even documenting expected behavior as part of these ABCs would go a long
way towards helping standardize behavior.

Another idea would be to put together a conformance test suite as part of
this effort, in lieu of some kind of run-time checking of behavior (which
would be terrible). That would help developers of other "ducks" check that
they're doing the right things. I'd imagine the existing NumPy test suite
would largely cover this.

Ryan

-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-07 Thread Stephan Hoyer
On Tue, Nov 7, 2017 at 12:23 PM Chris Barker  wrote:

>
> And then a third abc for indexing support, although, I am not sure how
>> that could get implemented...
>
>
> This is the really tricky one -- all ABCs really check is the existence of
> methods -- making sure they behave the same way is up to the developer of
> the ducktype.
>
> which is K, but will require discipline.
>
> But indexing, specifically fancy indexing, is another matter -- I'm not
> sure if there even a way with an ABC to check for what types of indexing
> are support, but we'd still have the problem with whether the semantics are
> the same!
>
> For example, I work with netcdf variable objects, which are partly
> duck-typed as ndarrays, but I think n-dimensional fancy indexing works
> differently... how in the world do you detect that with an ABC???
>

We recently worked out a hierarchy of indexing types for xarray. To a crude
approximation, we have:
- "Basic" indexing support for slices and integers. Nearly every array type
satisfies this.
- "Outer" or "orthogonal" indexing with slices, integers and 1D arrays.
This is what netCDF4-Python and Fortran/MATLAB support.
- "Vectorized" indexing with broadcasting and multi-dimensional indexers.
NumPy supports a generalization of this, but I would not wish the edge
cases involving mixed slices/arrays upon anyone.
- "Logical" indexing by a boolean array with the same shape.
- "Exactly like NumPy" for subclasses or wrappers around NumPy arrays.

There's some ambiguities in this, but that's what specs are for. For most
applications, we probably don't need most of these: ABCs for "Basic",
"Logical" and "Exactly like NumPy" would go a long ways.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-07 Thread Nathaniel Smith
On Nov 6, 2017 4:19 PM, "Chris Barker"  wrote:

On Sat, Nov 4, 2017 at 6:47 AM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

>
> You just summarized excellently why I'm on a quest to change `asarray`
> to `asanyarray` within numpy


+1 -- we should all be using asanyarray() most of the time.


The problem is that if you use 'asanyarray', then you're claiming that your
code works correctly for:
- regular ndarrays
- np.matrix
- np.ma masked arrays
- and every third party subclass, regardless of their semantics, regardless
of whether you've heard of them or not

If subclasses followed the Liskov substitution principle, and had different
internal implementations but the same public ("duck") API, then this would
be fine. But in practice, numpy limitations mean that ndarrays subclasses
have to have the same internal implementation, so the only reason to make
an ndarray subclass is if you want to make something with a different
public API. Basically the whole system is designed for subclasses to be
incompatible.

The end result is that if you use asanyarray, your code is definitely
wrong, because there's no way you're actually doing the right thing for
arbitrary ndarray subclasses. But if you don't use asanyarray, then yeah,
that's also wrong, because it won't work on mostly-compatible subclasses
like astropy's. Given this, different projects reasonably make different
choices -- it's not just legacy code that uses asarray. In the long run we
obviously need to come up with new options that don't have these tradeoffs
(that's why we want to let units to to dtypes, implement methods like
__array_ufunc__ to enable duck arrays, etc.) let's try to be sympathetic to
other projects that are doing their best :-).

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support

2017-11-07 Thread Nathaniel Smith
On Nov 7, 2017 2:15 PM, "Chris Barker"  wrote:

On Mon, Nov 6, 2017 at 6:14 PM, Charles R Harris 
wrote:

> Also -- if py2.7 continues to see the use I expect it will well past when
>>> pyton.org officially drops it, I wouldn't be surprised if a Python2.7
>>> Windows build based on a newer compiler would come along -- perhaps by
>>> Anaconda or conda-forge, or ???
>>>
>>
>> I suspect that this will indeed happen. I am aware of multiple companies
>> following this path already (building python + numpy themselves with a
>> newer MS compiler).
>>
>
> I think Anaconda is talking about distributing a compiler, but what that
> will be on windows is anyone's guess. When we drop 2.7, there is a lot of
> compatibility crud that it would be nice to get rid of, and if we do that
> then NumPy will no longer compile against 2.7. I suspect some companies
> have just been putting off the task of upgrading to Python 3, which should
> be pretty straight forward these days apart from system code that needs to
> do a lot of work with bytes.
>

I agree, and if there is a compelling reason to upgrade, folks WILL do it.
But I've been amazed over the years at folks' desire to stick with what
they have! And I'm guilty too, anything new I start with py3, but older
larger codebases are still py2, I just can't find the energy to spend a the
week or so it would probably take to update everything...

But in the original post, the Windows Compiler issue was mentioned, so
there seems to be two reasons to drop py2:

A) wanting to use py3 only features.
B) wanting to use never C (C++?) compiler features.

I suggest we be clear about which of these is driving the decisions, and
explicit about the goals. That is, if (A) is critical, we don't even have
to talk about (B)

But we could choose to do (B) without doing (A) -- I suspect there will be
a user base for that


The problem is it's hard to predict the future. Right now neither PyPI nor
conda provide any way to distribute binaries for py27-but-with-a-newer-ABI,
and maybe they never will; or maybe they will eventually, but not enough
people use them to justify keeping py2 support given the other overheads;
or... who knows, really.

Right now, the decision in front of us is what to tell people who ask about
numpy's py2 support plans, so that they can make their own plans. Given
what we know right now, I don't think we should promise to keep support
past 2018. If we get there and the situation's changed, and there's both
desire and means to extend support we can revisit that. But's better to
under-promise and possibly over-deliver, instead of promising to support
py2 until after it becomes a millstone around our necks and then realizing
we haven't warned anyone and are stuck supporting it another year beyond
that...

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-07 Thread Marten van Kerkwijk
Hi Nathaniel,

You're right, I shouldn't be righteous. Though I do think the
advantage of `asanyarray` inside numpy is remains that it is easy for
a user to add `asarray` to their input to a numpy function, and not
easy for a happily compatible subclass to avoid an `asarray` inside a
numpy function! I.e., coerce as little as you can get away with...

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion