Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-11 Thread Marten van Kerkwijk
Hi Eric,

Thanks very much for the detailed response; it is good to be reminded that
`MaskedArray` is used in a package that, indeed, (nearly?) all of us use!

But I do think that those of us who have been trying to change MaskedArray,
are generally good at making sure the tests continue to pass, i.e., that
the behaviour does not change (the main exception in the last few years was
that views should be taken of masks too, not just the data).

I also think that between __array_ufunc__ and __array_function__, it has
become quite easy to ensure that one no longer has to rely on `np.ma`
functions, i.e., that the regular numpy functions will do the right thing.
But it will need work to actually implement that.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-11 Thread Stephan Hoyer
On Sat, Nov 10, 2018 at 10:45 PM Eric Firing  wrote:

> On 2018/11/10 12:39 PM, Stephan Hoyer wrote:
> > On Sat, Nov 10, 2018 at 2:22 PM Hameer Abbasi  > > wrote:
> >
> > To summarize, I think these are our options:
> >
> > 1. Change the behavior of np.anyarray() to check for an
> > __anyarray__() protocol. Change np.matrix.__anyarray__() to
> > return a base numpy array (this is a minor backwards
> > compatibility break, but probably for the best). Start issuing a
> > FutureWarning for any MaskedArray operations that violate Liskov
> > and add a skipna argument that in the future will default to
> > skipna=False.
> >
> > 2. Introduce a new coercion function, e.g., np.duckarray(). This
> > is the easiest option because we don't need to cleanup NumPy's
> > existing ndarray subclasses.
> >
> >
> > My vote is still for 1. I don’t have an issue for PyData/Sparse
> > depending on recent-ish NumPy versions — It’ll need a lot of the
> > recent protocols anyway, although I could be convinced otherwise if
> > major package devs (scikits, SciPy, Dask) were to weigh in and say
> > they’ll jump on it (which seems unlikely given SciPy’s policy to
> > support old NumPy versions).
> >
> >
> > I agree that option (1) is fine for PyData/sparse. The bigger issue is
> > that this change should be conditional on making breaking changes (at
> > least raising FutureWarning for now) to np.ma.MaskedArray.
> >
> > I don't know how people who currently use MaskedArray would feel about
> > that. I would love to hear their thoughts.
>
> Thank you.  I am a user of masked arrays, and have been since pre-numpy
> days.  I introduced their extensive use in matplotlib long ago.  I have
> been a bit concerned, indeed, that all of the discussion of modifying
> masked arrays seems to be by people who don't actually use them
> explicitly (though they might be using them without knowing it via
> internal operations in matplotlib, or they might be quickly getting rid
> of them after they are yielded by netCDF4.Dataset()).
>
> I think that those of us who do use masked arrays recognize that they
> are not perfect; they have some quirks and gotchas, and one has to be
> careful to use numpy.ma functions instead of numpy functions in most
> cases.  But we use them because they have real advantages over the
> alternatives, which are using nans and/or manually tracking independent
> masks throughout calculations.  These advantages are largely because
> masked values *don't* behave like nan, *don't* propagate.  This is
> fundamental to the design, and motivated by real-life use cases.
>
> The proposal to add a skipna kwarg to MaskedArray looks to me like it is
> giving purity priority over practicality.  It will force ma users to
> insert skipna kwargs all over the place--because the default will be
> contrary to the primary purposes of using masked arrays, in most cases.
> How many people will it actually benefit?  How many people are being
> bitten, and how badly, by masked array behavior?
>
> If there were a prospect of truly integrating missing/masked value
> handling into numpy, simplifying or phasing out numpy.ma, I would be
> delighted--I think it is the biggest single fundamental improvement that
> could be made, from the user's standpoint.  I was sad to see Mark
> Wiebe's work in that direction come to grief.
>
> If there are ways of gradually improving numpy.ma and its
> interoperability with the rest of numpy and with the proliferation of
> duck arrays, I'm all in favor--so long as they don't effectively wreck
> numpy.ma for its present intended purposes.


Eric -- thank you for sharing your perspective! I guess it should not be
surprising that the semantics of MaskedArray intentionally deviate from the
semantics of base NumPy arrays.

This deviation is fortunately less severe than than deviations in the
behavior of np.matrix, but it still presents some difficulties for duck
typing. We're in a position to reduce (but still not eliminate) these
differences with new protocols like __array_function__.

I think Nathaniel actually summarized these issues pretty well in NEP 16 (
http://www.numpy.org/neps/nep-0016-abstract-array.html). If we want a
coercion function that guarantees an object is a "full duck array", then it
can't pass on either np.matrix or MaskedArray in their current state.
Anything less than full compatibility provides a shaky foundation for use
in downstream projects or inside NumPy itself.

In theory (certainly if we were starting from scratch) it would make sense
to make asabstractarray() pass on any ndarray subclass, but this would
require willingness to make breaking changes to both np.matrix and
MaskedArray.

I would suggest adopting a variation of the proposal in NEP 16, except
using a protocol rather an abstract base class per NEP 22, e.g.,

# names still to be determined
def 

[Numpy-discussion] Developer Meeting, Berkeley, 30 Nov / 1 Dec

2018-11-11 Thread Stefan van der Walt
Hi everyone,

On Friday 30 November & Saturday 1 December we will host a NumPy
Development Meeting at the Berkeley Institute for Data Science.  We will
discuss the work being done on dtypes, review NEPs under implementation,
and solicit feedback for updating the community roadmap.

Please get in touch if you would like to attend, so that we can tally
the numbers and work out travel support.

Best regards,
Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Weekly status meeting 8.11 at 12:00 pacific time

2018-11-11 Thread Stefan van der Walt
On Tue, 06 Nov 2018 16:28:36 -0800, Matti Picus wrote:
> We will be holding our weekly BIDS NumPy status meeting on Thurs Nov 8 at
> noon pacific time.

The meeting notes are now posted at
https://github.com/BIDS-numpy/docs/blob/master/status_meetings/status-2018-11-08.md

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion