Re: [Numpy-discussion] np.nonzero behavior with multidimensional arrays

2015-02-23 Thread Jaime Fernández del Río
On Mon, Feb 23, 2015 at 12:12 PM, Julian Taylor 
jtaylor.deb...@googlemail.com wrote:

 On 23.02.2015 08:52, Jaime Fernández del Río wrote:
  This was raised in SO today:
 
 
 http://stackoverflow.com/questions/28663142/why-is-np-wheres-result-read-only-for-multi-dimensional-arrays/28664009
 
  np.nonzero (and np.where for boolean arrays) behave differently for 1-D
  and higher dimensional arrays:
 
  In the first case, a tuple with a single behaved base ndarray is
 returned:
 
  In the second, a tuple with as many arrays as dimensions in the passed
  array is returned, but the arrays are not base ndarrays, but of the same
  subtype as was passed to the function. These arrays are also set as
  non-writeable:
 


 The non-writeable looks like a bug too me, it should probably just use
 PyArray_FLAGS(self) instead of 0. We had a similar one with the new
 indexing, its easy to forget this.

 Concerning subtypes, I don't think there is a good reason to preserve
 them here and it should just return an ndarray.
 where with one argument returns a new object that indexes the input
 object so it is not really related anymore to what it indexes and there
 is no information that numpy could reasonably propagate.


That was my thinking when I sent that message last night: add the
PyArray_FLAGS argument, and pass the type of the return array rather than
the input array when creating the views.

I tried to put that in a PR, but it fails a number of tests, as the return
of np.nonzero is specifically checked to return the subtype of the passed
in array, both in matrixlib, as well as in core/test_regression.py, related
to Trac #791:

https://github.com/numpy/numpy/issues/1389

So it seems that 7 years ago they had a different view on this, perhaps
Chuck remembers what the rationale was, but this seems like a weird
requirement for index returning functions: nonzero, argmin/max, argsort,
argpartition and the like.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.nonzero behavior with multidimensional arrays

2015-02-23 Thread Julian Taylor
On 23.02.2015 08:52, Jaime Fernández del Río wrote:
 This was raised in SO today:
 
 http://stackoverflow.com/questions/28663142/why-is-np-wheres-result-read-only-for-multi-dimensional-arrays/28664009
 
 np.nonzero (and np.where for boolean arrays) behave differently for 1-D
 and higher dimensional arrays:
 
 In the first case, a tuple with a single behaved base ndarray is returned:
 
 In the second, a tuple with as many arrays as dimensions in the passed
 array is returned, but the arrays are not base ndarrays, but of the same
 subtype as was passed to the function. These arrays are also set as
 non-writeable:
 


The non-writeable looks like a bug too me, it should probably just use
PyArray_FLAGS(self) instead of 0. We had a similar one with the new
indexing, its easy to forget this.

Concerning subtypes, I don't think there is a good reason to preserve
them here and it should just return an ndarray.
where with one argument returns a new object that indexes the input
object so it is not really related anymore to what it indexes and there
is no information that numpy could reasonably propagate.

(where with three arguments make sense with subtypes and fixing that is
on my todo list)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.nonzero behavior with multidimensional arrays

2015-02-23 Thread Charles R Harris
On Mon, Feb 23, 2015 at 2:29 PM, Jaime Fernández del Río 
jaime.f...@gmail.com wrote:



 On Mon, Feb 23, 2015 at 12:12 PM, Julian Taylor 
 jtaylor.deb...@googlemail.com wrote:

 On 23.02.2015 08:52, Jaime Fernández del Río wrote:
  This was raised in SO today:
 
 
 http://stackoverflow.com/questions/28663142/why-is-np-wheres-result-read-only-for-multi-dimensional-arrays/28664009
 
  np.nonzero (and np.where for boolean arrays) behave differently for 1-D
  and higher dimensional arrays:
 
  In the first case, a tuple with a single behaved base ndarray is
 returned:
 
  In the second, a tuple with as many arrays as dimensions in the passed
  array is returned, but the arrays are not base ndarrays, but of the same
  subtype as was passed to the function. These arrays are also set as
  non-writeable:
 


 The non-writeable looks like a bug too me, it should probably just use
 PyArray_FLAGS(self) instead of 0. We had a similar one with the new
 indexing, its easy to forget this.

 Concerning subtypes, I don't think there is a good reason to preserve
 them here and it should just return an ndarray.
 where with one argument returns a new object that indexes the input
 object so it is not really related anymore to what it indexes and there
 is no information that numpy could reasonably propagate.


 That was my thinking when I sent that message last night: add the
 PyArray_FLAGS argument, and pass the type of the return array rather than
 the input array when creating the views.

 I tried to put that in a PR, but it fails a number of tests, as the return
 of np.nonzero is specifically checked to return the subtype of the passed
 in array, both in matrixlib, as well as in core/test_regression.py, related
 to Trac #791:

 https://github.com/numpy/numpy/issues/1389

 So it seems that 7 years ago they had a different view on this, perhaps
 Chuck remembers what the rationale was, but this seems like a weird
 requirement for index returning functions: nonzero, argmin/max, argsort,
 argpartition and the like.


That would be, what, 2008? That was way long ago, back around 1.1, and
before I was much involved. I don't know what the rational was at that
time, but it may have been inherited from Numeric or  Numarray, or just
seemed like the right thing to do.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] np.nonzero behavior with multidimensional arrays

2015-02-22 Thread Jaime Fernández del Río
This was raised in SO today:

http://stackoverflow.com/questions/28663142/why-is-np-wheres-result-read-only-for-multi-dimensional-arrays/28664009

np.nonzero (and np.where for boolean arrays) behave differently for 1-D and
higher dimensional arrays:

In the first case, a tuple with a single behaved base ndarray is returned:

 a = np.ma.array(range(6))
 np.where(a  3)
(array([4, 5]),)
 np.where(a  3)[0].flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In the second, a tuple with as many arrays as dimensions in the passed
array is returned, but the arrays are not base ndarrays, but of the same
subtype as was passed to the function. These arrays are also set as
non-writeable:

 np.where(a.reshape(2, 3)  3)
(masked_array(data = [1 1],
 mask = False,
   fill_value = 99)
, masked_array(data = [1 2],
 mask = False,
   fill_value = 99)
)
 np.where(a.reshape(2, 3)  3)[0].flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  UPDATEIFCOPY : False

I can't think of any reason that justifies this difference, and believe
they should be made to return similar results. My feeling is that the
proper behavior is the 1-D one, and that the behavior for multidimensional
arrays should match it. Anyone can think of any reason that justifies the
current behavior?

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion