Re: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?

2015-05-12 Thread Marten van Kerkwijk
Agreed that indexing functions should return bare `ndarray`. Note that in
Jaime's PR one can override it anyway by defining __nonzero__.  -- Marten

On Sat, May 9, 2015 at 9:53 PM, Stephan Hoyer sho...@gmail.com wrote:

  With regards to np.where -- shouldn't where be a ufunc, so subclasses or
 other array-likes can be control its behavior with __numpy_ufunc__?

 As for the other indexing functions, I don't have a strong opinion about
 how they should handle subclasses. But it is certainly tricky to attempt to
 handle handle arbitrary subclasses. I would agree that the least error
 prone thing to do is usually to return base ndarrays. Better to force
 subclasses to override methods explicitly.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?

2015-05-09 Thread Nathaniel Smith
On May 9, 2015 12:54 PM, Benjamin Root ben.r...@ou.edu wrote:

 Absolutely, it should be writable. As for subclassing, that might be
messy. Consider the following:

 inds = np.where(data  5)

 In that case, I'd expect a normal, bog-standard ndarray because that is
what you use for indexing (although pandas might have a good argument for
having it return one of their special indexing types if data was a pandas
array...).

Pandas doesn't subclass ndarray (anymore), so they're irrelevant to this
particular discussion :-). Of course they're an argument for having a
cleaner more general way of allowing non-ndarray array-like objects, but
the legacy subclassing system will never be that.

 Next:

 foobar = np.where(data  5, 1, 2)

 Again, I'd expect a normal, bog-standard ndarray because the scalar
elements are very simple. This question gets very complicated when
considering array arguments. Consider:

 merged_data = np.where(data  5, data, data2)

 So, what should merged_data be? If both data and data2 are the same
types, then it would be reasonable to return the same type, if possible.
But what if they aren't the same? Maybe use array_priority to determine the
return type? Or, perhaps it does make sense to say sod it all and always
return an ndarray?

Not sure what this has to do with Jaime's post about nonzero? There is
indeed a potential question about what 3-argument where() should do with
subclasses, but that's effectively a different operation entirely and to
discuss it we'd need to know things like what it historically has done and
why that was causing problems.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?

2015-05-09 Thread Nathaniel Smith
On Sat, May 9, 2015 at 1:27 PM, Benjamin Root ben.r...@ou.edu wrote:

 On Sat, May 9, 2015 at 4:03 PM, Nathaniel Smith n...@pobox.com wrote:

 Not sure what this has to do with Jaime's post about nonzero? There is
 indeed a potential question about what 3-argument where() should do with
 subclasses, but that's effectively a different operation entirely and to
 discuss it we'd need to know things like what it historically has done and
 why that was causing problems.

 Because my train of thought started at np.nonzero(), which I have always
 just mentally mapped to np.where(), and then... squirrel!

 Indeed, np.where() has no bearing here.

Ah, gotcha :-).

There is an argument that we should try to reduce this confusion by
nudging people to use np.nonzero() consistently instead of np.where(),
via the documentation and/or a warning message...

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?

2015-05-09 Thread Stephan Hoyer
With regards to np.where -- shouldn't where be a ufunc, so subclasses or other 
array-likes can be control its behavior with __numpy_ufunc__?


As for the other indexing functions, I don't have a strong opinion about how 
they should handle subclasses. But it is certainly tricky to attempt to handle 
handle arbitrary subclasses. I would agree that the least error prone thing to 
do is usually to return base ndarrays. Better to force subclasses to override 
methods explicitly.___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?

2015-05-09 Thread Benjamin Root
On Sat, May 9, 2015 at 4:03 PM, Nathaniel Smith n...@pobox.com wrote:

 Not sure what this has to do with Jaime's post about nonzero? There is
 indeed a potential question about what 3-argument where() should do with
 subclasses, but that's effectively a different operation entirely and to
 discuss it we'd need to know things like what it historically has done and
 why that was causing problems.



Because my train of thought started at np.nonzero(), which I have always
just mentally mapped to np.where(), and then... squirrel!

Indeed, np.where() has no bearing here.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?

2015-05-09 Thread Jaime Fernández del Río
There is a reported bug (issue #5837
https://github.com/numpy/numpy/issues/5837) regarding different returns
from np.nonzero with 1-D vs higher dimensional arrays. A full summary of
the differences can be seen from the following output:

 class C(np.ndarray): pass
...
 a = np.arange(6).view(C)
 b = np.arange(6).reshape(2, 3).view(C)
 anz = a.nonzero()
 bnz = b.nonzero()

 type(anz[0])
type 'numpy.ndarray'
 anz[0].flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
 anz[0].base

 type(bnz[0])
class '__main__.C'
 bnz[0].flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  UPDATEIFCOPY : False
 bnz[0].base
array([[0, 1],
   [0, 2],
   [1, 0],
   [1, 1],
   [1, 2]])

The original bug report was only concerned with the non-writeability of
higher dimensional array returns, but there are more differences: 1-D
always returns an ndarray that owns its memory and is writeable, but higher
dimensional arrays return views, of the type of the original array, that
are non-writeable.

I have a branch that attempts to fix this by making both 1-D and n-D arrays:

   1. return a view, never the base array,
   2. return an ndarray, never a subclass, and
   3. return a writeable view.

I guess the most controversial choice is #2, and in fact making that change
breaks a few tests. I nevertheless think that all of the index returning
functions (nonzero, argsort, argmin, argmax, argpartition) should always
return a bare ndarray, not a subclass. I'd be happy to be corrected, but I
can't think of any situation in which preserving the subclass would be
needed for these functions.

Since we are changing the returns of a few other functions in 1.10
(diagonal, diag, ravel), it may be a good moment to revisit the behavior
for these other functions. Any thoughts?

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?

2015-05-09 Thread Nathaniel Smith
On May 9, 2015 10:48 AM, Jaime Fernández del Río jaime.f...@gmail.com
wrote:

 There is a reported bug (issue #5837) regarding different returns from
np.nonzero with 1-D vs higher dimensional arrays. A full summary of the
differences can be seen from the following output:

  class C(np.ndarray): pass
 ...
  a = np.arange(6).view(C)
  b = np.arange(6).reshape(2, 3).view(C)
  anz = a.nonzero()
  bnz = b.nonzero()

  type(anz[0])
 type 'numpy.ndarray'
  anz[0].flags
   C_CONTIGUOUS : True
   F_CONTIGUOUS : True
   OWNDATA : True
   WRITEABLE : True
   ALIGNED : True
   UPDATEIFCOPY : False
  anz[0].base

  type(bnz[0])
 class '__main__.C'
  bnz[0].flags
   C_CONTIGUOUS : False
   F_CONTIGUOUS : False
   OWNDATA : False
   WRITEABLE : False
   ALIGNED : True
   UPDATEIFCOPY : False
  bnz[0].base
 array([[0, 1],
[0, 2],
[1, 0],
[1, 1],
[1, 2]])

 The original bug report was only concerned with the non-writeability of
higher dimensional array returns, but there are more differences: 1-D
always returns an ndarray that owns its memory and is writeable, but higher
dimensional arrays return views, of the type of the original array, that
are non-writeable.

 I have a branch that attempts to fix this by making both 1-D and n-D
arrays:
 return a view, never the base array,

This doesn't matter, does it? View isn't a thing, only view of is
meaningful. And in this case, none of the returned arrays share any memory
with any other arrays that the user has access to... so whether they were
created as a view or not should be an implementation detail that's
transparent to the user?

 return an ndarray, never a subclass, and
 return a writeable view.
 I guess the most controversial choice is #2, and in fact making that
change breaks a few tests. I nevertheless think that all of the index
returning functions (nonzero, argsort, argmin, argmax, argpartition) should
always return a bare ndarray, not a subclass. I'd be happy to be corrected,
but I can't think of any situation in which preserving the subclass would
be needed for these functions.

I also can't see any logical reason why the return type of these functions
has anything to do with the type of the inputs. You can index me with my
phone number but my phone number is not a person. OTOH logic and ndarray
subclassing don't have much to do with each other; the practical effect is
probably more important. Looking at the subclasses I know about (masked
arrays, np.matrix, and astropy quantities), though, I also can't see much
benefit in copying the subclass of the input, and the fact that we were
never consistent about this suggests that people probably aren't depending
on it too much.

So in summary my feeling is: +1 to making then writable, no objection to
the view thing (though I don't see how it matters), and provisional +1 to
consistently returning ndarray (to be revised if the people who use the
subclassing functionality disagree).

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?

2015-05-09 Thread Benjamin Root
Absolutely, it should be writable. As for subclassing, that might be messy.
Consider the following:

inds = np.where(data  5)

In that case, I'd expect a normal, bog-standard ndarray because that is
what you use for indexing (although pandas might have a good argument for
having it return one of their special indexing types if data was a pandas
array...). Next:

foobar = np.where(data  5, 1, 2)

Again, I'd expect a normal, bog-standard ndarray because the scalar
elements are very simple. This question gets very complicated when
considering array arguments. Consider:

merged_data = np.where(data  5, data, data2)

So, what should merged_data be? If both data and data2 are the same
types, then it would be reasonable to return the same type, if possible.
But what if they aren't the same? Maybe use array_priority to determine the
return type? Or, perhaps it does make sense to say sod it all and always
return an ndarray?

I don't know the answer. I do find it interesting that the result from a
multi-dimensional array is not writable. I don't know why I have never
encountered that.


Ben Root


On Sat, May 9, 2015 at 2:42 PM, Nathaniel Smith n...@pobox.com wrote:

 On May 9, 2015 10:48 AM, Jaime Fernández del Río jaime.f...@gmail.com
 wrote:
 
  There is a reported bug (issue #5837) regarding different returns from
 np.nonzero with 1-D vs higher dimensional arrays. A full summary of the
 differences can be seen from the following output:
 
   class C(np.ndarray): pass
  ...
   a = np.arange(6).view(C)
   b = np.arange(6).reshape(2, 3).view(C)
   anz = a.nonzero()
   bnz = b.nonzero()
 
   type(anz[0])
  type 'numpy.ndarray'
   anz[0].flags
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
   anz[0].base
 
   type(bnz[0])
  class '__main__.C'
   bnz[0].flags
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : False
ALIGNED : True
UPDATEIFCOPY : False
   bnz[0].base
  array([[0, 1],
 [0, 2],
 [1, 0],
 [1, 1],
 [1, 2]])
 
  The original bug report was only concerned with the non-writeability of
 higher dimensional array returns, but there are more differences: 1-D
 always returns an ndarray that owns its memory and is writeable, but higher
 dimensional arrays return views, of the type of the original array, that
 are non-writeable.
 
  I have a branch that attempts to fix this by making both 1-D and n-D
 arrays:
  return a view, never the base array,

 This doesn't matter, does it? View isn't a thing, only view of is
 meaningful. And in this case, none of the returned arrays share any memory
 with any other arrays that the user has access to... so whether they were
 created as a view or not should be an implementation detail that's
 transparent to the user?

  return an ndarray, never a subclass, and
  return a writeable view.
  I guess the most controversial choice is #2, and in fact making that
 change breaks a few tests. I nevertheless think that all of the index
 returning functions (nonzero, argsort, argmin, argmax, argpartition) should
 always return a bare ndarray, not a subclass. I'd be happy to be corrected,
 but I can't think of any situation in which preserving the subclass would
 be needed for these functions.

 I also can't see any logical reason why the return type of these functions
 has anything to do with the type of the inputs. You can index me with my
 phone number but my phone number is not a person. OTOH logic and ndarray
 subclassing don't have much to do with each other; the practical effect is
 probably more important. Looking at the subclasses I know about (masked
 arrays, np.matrix, and astropy quantities), though, I also can't see much
 benefit in copying the subclass of the input, and the fact that we were
 never consistent about this suggests that people probably aren't depending
 on it too much.

 So in summary my feeling is: +1 to making then writable, no objection to
 the view thing (though I don't see how it matters), and provisional +1 to
 consistently returning ndarray (to be revised if the people who use the
 subclassing functionality disagree).

 -n

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion