Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-04 Thread Sebastian Berg
On So, 2016-09-04 at 11:20 -0400, Marten van Kerkwijk wrote:
> Hi Sebastian,
> 
> I haven't given this as much thought as it deserves, but thought I
> would comment from the astropy perspective, where we both have direct
> subclasses of `ndarray` (`Quantity`, `Column`, `MaskedColumn`) and
> classes that store their data internally as ndarray (subclass)
> instances (`Time`, `SkyCoord`, ...).
> 
> One comment would be that if one were to introduce a special method,
> one should perhaps think a bit more broadly, and capture more than
> the
> indexing methods with it. I wonder about this because for the
> array-holding classes mentioned above, we initially just had
> `__getitem__`, which got the relevant items from the underlying
> arrays, and then constructed a new instance with those. But recently
> we realised that methods like `reshape`, `transpose`, etc., require
> essentially the same steps, and so we constructed a new
> `ShapedLikeNDArray` mixin, which provides all of those [1] as long as
> one defines a single `_apply` method. (Indeed, it turns out that the
> same technique works without any real change for some numpy functions
> such as `np.broadcast_to`.)
> 
> That said, in the actual ndarray subclasses, we have not found a need
> to overwrite any of the reshaping methods, since those methods are
> all
> handled OK via `__array_finalize__`. We do overwrite `__getitem__`
> (and `item`) as we need to take care of scalars. And we would
> obviously have to overwrite `oindex`, etc., as well, for the same
> reason, so in that respect a common method might be useful.
> 
> However, perhaps it is worth considering that the only reason we need
> to overwrite them in the first place, unlike what is the case for all
> the shape-changing methods, is that scalar output does not get put
> through `__array_finalize__`. Might it be an idea to have the new
> indexing methods return array scalars instead of normal ones so we
> can
> get rid of this?

I did not realize the new numpys are special with the scalar handling?
The indexing (already before 1.9. I believe) always goes through
PyArray_ScalarReturn or so, which I thought was used by almost all
functions.

If you mean the attributes (oindex, etc.), they could behave a bit
different of course, though not sure to what it extend it actually
helps since that would also create disparity.
If we implement a new special method (__numpy_getitem__), they
definitely should behave slightly different in some places. One option
might be to not even do the wrapping, but leave it to the subclass.

However, if you have an array with arrays inside, knowing whether to
return a scalar correctly would have to rely on inspecting the index
object, which is why I suggested the indexer to give a few extra
informations (such as this one).

Of course, since the scalar return goes through a ScalarReturn
function, that function could maybe also be tought to indicate the
scalar to `__array_finalize__`/`__array_wrap__` (not sure what exactly
applies).

- Sebastian


> All the best,
> 
> Marten
> 
> [1] https://github.com/astropy/astropy/blob/master/astropy/utils/misc
> .py#L856
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-04 Thread Marten van Kerkwijk
Hi Sebastian,

I haven't given this as much thought as it deserves, but thought I
would comment from the astropy perspective, where we both have direct
subclasses of `ndarray` (`Quantity`, `Column`, `MaskedColumn`) and
classes that store their data internally as ndarray (subclass)
instances (`Time`, `SkyCoord`, ...).

One comment would be that if one were to introduce a special method,
one should perhaps think a bit more broadly, and capture more than the
indexing methods with it. I wonder about this because for the
array-holding classes mentioned above, we initially just had
`__getitem__`, which got the relevant items from the underlying
arrays, and then constructed a new instance with those. But recently
we realised that methods like `reshape`, `transpose`, etc., require
essentially the same steps, and so we constructed a new
`ShapedLikeNDArray` mixin, which provides all of those [1] as long as
one defines a single `_apply` method. (Indeed, it turns out that the
same technique works without any real change for some numpy functions
such as `np.broadcast_to`.)

That said, in the actual ndarray subclasses, we have not found a need
to overwrite any of the reshaping methods, since those methods are all
handled OK via `__array_finalize__`. We do overwrite `__getitem__`
(and `item`) as we need to take care of scalars. And we would
obviously have to overwrite `oindex`, etc., as well, for the same
reason, so in that respect a common method might be useful.

However, perhaps it is worth considering that the only reason we need
to overwrite them in the first place, unlike what is the case for all
the shape-changing methods, is that scalar output does not get put
through `__array_finalize__`. Might it be an idea to have the new
indexing methods return array scalars instead of normal ones so we can
get rid of this?

All the best,

Marten

[1] https://github.com/astropy/astropy/blob/master/astropy/utils/misc.py#L856
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Pull Request regarding meshgrid

2016-09-04 Thread Paul Reiter
https://github.com/numpy/numpy/pull/7984

Hi everybody,
I created my first pull request for numpy and as mentioned in the numpy
development workflow documentation I hereby post a link to it and a short
description to the mailing list.
Please take a look.


I didn't find a good way to create a contour plot of data of the form:
[(x1, y1, f(x1, y1)), (x2, y2, f(x2, y2)), ..., (xn, yn, f(xn, yn))].
In order to do a contour plot, one has to bring the data into the meshgrid
format.
One possibility would be complicated sorting and reshaping of the data, but
this is not easily possible especially if values are missing (not all
combinations of (x, y) contained in data).
Another way, which is used in all tutorials about contour plotting, is to
create the meshgrid beforehand and than apply the function to the meshgrid
matrices:

x = np.linspace(-3, 3, n)
y = np.linspace(-3, 3, n)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
plt.contourplot(X, Y, Z)

But if one does not have the function but only the data, this is also no
option.
My function essentially creates a dictionary {(x1, y1): f(x1, y1), (x2,
y2): f(x2, y2), ..., (xn, yn): f(xn, yn)} with the coordinate tuples as
keys and function values as values. Then it creates a meshgrid from all
unique x and y coordinates (X and Y). The dictionary is then used to create
the matrix Z, filling in np.nan for all missing values. This allows to do
the following, with x, y and z being the x, y coordinates and z being the
according function value:

plt.contourplot(*meshgridify(x, y, f=z))

Maybe there is a simpler solution, but I didn't find one.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-04 Thread Sebastian Berg
On So, 2016-09-04 at 14:10 +0200, Sebastian Berg wrote:
> On Sa, 2016-09-03 at 21:08 +0200, Sebastian Berg wrote:
> > 
> > Hi all,
> > 
> > not that I am planning to spend much time on this right now,
> > however,
> > I
> > did a small rebase of the stuff I had (did not push yet) on oindex
> > and
> > remembered the old problem ;).
> > 
> > The one remaining issue I have with adding things like (except
> > making
> > the code prettier and writing tests):
> > 
> > arr.oindex[...]  # outer/orthogonal indexing
> > arr.vindex[...]  # Picking of elements (much like current)
> > arr.lindex[...]  # current behaviour for backward compat
> > 
> > is what to do about subclasses. Now what I can do (and have
> > currently
> > in my branch) is to tell someone on `subclass.oindex[...]`: This
> > won't
> > work, the subclass implements `__getitem__` or `__setitem__` so I
> > don't
> > know if the result would be correct (its a bit annoying if you also
> > warn about using those attributes, but...).
> > 
> Hmm, I am considering to expose a new indexing helper object. So that
> subclasses could implement something like `__numpy_getitem__` and
> `__numpy_setitem__` and if they do (and preferably nothing else) they
> would get back passed a small object with some information about the
> indexing operation. So that the subclass would implement:
> 
> ```
> def __numpy_setitem__(self, indexer, values):
>     indexer.method  # one of {"plain", "oindex", "vindex", "lindex"}
>     indexer.scalar  # Will the result be a scalar?
>     indexer.view  # Will the result be a view or a copy?
>     # More information might be possible (note that not all checks
> are
> # done at this point, just basic checks will have happened
> already).
> 
>     # Do some code, that prepares self or values, could also use
>     # indexer for another array (e.g. mask) of the same shape.
> 
>     result = indexer(self, values)
> 
>     # Do some coded to fixup the result if necessary.
>     # Should discuss whether result is first a plain ndarray or
>     # already wrapped.
> ```

Hmm, field access is a bit annoying, but I guess can/has to be
included.

> 
> This could be implemented in the C-side without much hassle, I think.
> Of course it adds some new API which we would have to support
> indefinitely. But it seems something like this would also fix the
> hassle of identifying e.g. if the result should be a scalar for a
> subclass (which may even be impossible in some cases).
> 
> Would be very happy about feedback from subclassers!
> 
> - Sebastian
> 
> 
> > 
> > However, with or without such error, we need a nice way for
> > subclasses
> > to define these attributes! This is important even within numpy at
> > least for masked arrays (possibly also matrix and memmap).
> > 
> > They (typically) do some stuff before or after the plain indexing
> > operation, so how do we make it convenient to allow them to do the
> > same
> > stuff for the special indexing attributes without weird code
> > duplication? I can think of things, but nothing too great yet so
> > maybe
> > you guys got an elegant idea.
> > 
> > - Sebastian
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-04 Thread Sebastian Berg
On Sa, 2016-09-03 at 21:08 +0200, Sebastian Berg wrote:
> Hi all,
> 
> not that I am planning to spend much time on this right now, however,
> I
> did a small rebase of the stuff I had (did not push yet) on oindex
> and
> remembered the old problem ;).
> 
> The one remaining issue I have with adding things like (except making
> the code prettier and writing tests):
> 
> arr.oindex[...]  # outer/orthogonal indexing
> arr.vindex[...]  # Picking of elements (much like current)
> arr.lindex[...]  # current behaviour for backward compat
> 
> is what to do about subclasses. Now what I can do (and have currently
> in my branch) is to tell someone on `subclass.oindex[...]`: This
> won't
> work, the subclass implements `__getitem__` or `__setitem__` so I
> don't
> know if the result would be correct (its a bit annoying if you also
> warn about using those attributes, but...).
> 

Hmm, I am considering to expose a new indexing helper object. So that
subclasses could implement something like `__numpy_getitem__` and
`__numpy_setitem__` and if they do (and preferably nothing else) they
would get back passed a small object with some information about the
indexing operation. So that the subclass would implement:

```
def __numpy_setitem__(self, indexer, values):
    indexer.method  # one of {"plain", "oindex", "vindex", "lindex"}
    indexer.scalar  # Will the result be a scalar?
    indexer.view  # Will the result be a view or a copy?
    # More information might be possible (note that not all checks are
# done at this point, just basic checks will have happened already).

    # Do some code, that prepares self or values, could also use
    # indexer for another array (e.g. mask) of the same shape.

    result = indexer(self, values)

    # Do some coded to fixup the result if necessary.
    # Should discuss whether result is first a plain ndarray or
    # already wrapped.
```

This could be implemented in the C-side without much hassle, I think.
Of course it adds some new API which we would have to support
indefinitely. But it seems something like this would also fix the
hassle of identifying e.g. if the result should be a scalar for a
subclass (which may even be impossible in some cases).

Would be very happy about feedback from subclassers!

- Sebastian


> However, with or without such error, we need a nice way for
> subclasses
> to define these attributes! This is important even within numpy at
> least for masked arrays (possibly also matrix and memmap).
> 
> They (typically) do some stuff before or after the plain indexing
> operation, so how do we make it convenient to allow them to do the
> same
> stuff for the special indexing attributes without weird code
> duplication? I can think of things, but nothing too great yet so
> maybe
> you guys got an elegant idea.
> 
> - Sebastian
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion