Re: [Numpy-discussion] read-only or immutable masked array

2013-07-15 Thread Gregorio Bastardo
Hi Pierre,

 I'm a bit surprised, though. Here's what I tried

 np.version.version
  1.7.0
 x = np.ma.array([1,2,3], mask=[0,1,0])
 x.flags.writeable=False
 x[0]=-1
  ValueError: assignment destination is read-only

Thanks, it works perfectly =) Sorry, probably have overlooked this
simple solution, tried to set x.data and x.mask directly. I noticed
that this only protects the data, so mask also has to be set to
read-only or be hardened to avoid accidental (un)masking.

Gregorio
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] read-only or immutable masked array

2013-07-15 Thread Pierre Gerard-Marchant

On Jul 15, 2013, at 10:04 , Gregorio Bastardo gregorio.basta...@gmail.com 
wrote:

 Hi Pierre,
 
 I'm a bit surprised, though. Here's what I tried
 
 np.version.version
  1.7.0
 x = np.ma.array([1,2,3], mask=[0,1,0])
 x.flags.writeable=False
 x[0]=-1
  ValueError: assignment destination is read-only
 
 Thanks, it works perfectly =) Sorry, probably have overlooked this
 simple solution, tried to set x.data and x.mask directly. I noticed
 that this only protects the data, so mask also has to be set to
 read-only or be hardened to avoid accidental (un)masking.

Well, yes and no. Settings the flags of `x` doesn't set (yet) the flags of the 
mask, that's true. Still, `.writeable=False` should prevent you to unmask data, 
provided you're not trying to modify the mask directly but use basic assignment 
like `x[…]=…`. However, assigning `np.ma.masked` to array items does modify the 
mask and only the mask, hence the absence of error if the array is not 
writeable.

Note as well that hardening the mask only prevents unmasking: you can still 
grow the mask, which may not be what you want. Use 
`x.mask.flags.writeable=False` to make the mask really read-only.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] read-only or immutable masked array

2013-07-15 Thread Gregorio Bastardo
Hi Pierre,

 Note as well that hardening the mask only prevents unmasking: you can still 
 grow the mask, which may not be what you want. Use 
 `x.mask.flags.writeable=False` to make the mask really read-only.

I ran into an unmasking problem with the suggested approach:

 np.version.version
'1.7.0'
 x = np.ma.masked_array(xrange(4), [0,1,0,1])
 x
masked_array(data = [0 -- 2 --],
 mask = [False  True False  True],
   fill_value = 99)
 x.flags.writeable = False
 x.mask.flags.writeable = False
 x.mask[1] = 0 # ok
Traceback (most recent call last):
  ...
ValueError: assignment destination is read-only
 x[1] = 0 # ok
Traceback (most recent call last):
  ...
ValueError: assignment destination is read-only
 x.mask[1] = 0 # ??
 x
masked_array(data = [0 1 2 --],
 mask = [False False False  True],
   fill_value = 99)

I noticed that sharedmask attribute changes (from True to False)
after x[1] = 0. Also, some of the ma operations result mask identity
of the new ma, which causes ValueError when the new ma mask is
modified:

 x = np.ma.masked_array(xrange(4), [0,1,0,1])
 x.flags.writeable = False
 x.mask.flags.writeable = False
 x1 = x  0
 x1.mask is x.mask # ok
False
 x2 = x != 0
 x2.mask is x.mask # ??
True
 x2.mask[1] = 0
Traceback (most recent call last):
  ...
ValueError: assignment destination is read-only

which is a bit confusing. And I experienced that *_like operations
give mask identity too:

 y = np.ones_like(x)
 y.mask is x.mask
True

but for that I found a recent discussion (empty_like for masked
arrays) on the mailing list:
http://mail.scipy.org/pipermail/numpy-discussion/2013-June/066836.html

I might be missing something but could you clarify these issues?

Thanks,
Gregorio
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Allow == and != to raise errors

2013-07-15 Thread bruno Piguet
Python itself doesn't raise an exception in such cases :

 (3,4) != (2, 3, 4)
True
 (3,4) == (2, 3, 4)
False


Should numpy behave differently ?

Bruno.


2013/7/12 Frédéric Bastien no...@nouiz.org

 I also don't like that idea, but I'm not able to come to a good reasoning
 like Benjamin.

 I don't see advantage to this change and the reason isn't good enough to
 justify breaking the interface I think.

 But I don't think we rely on this, so if the change goes in, it probably
 won't break stuff or they will be easily seen and repared.

 Fred


 On Fri, Jul 12, 2013 at 9:13 AM, Benjamin Root ben.r...@ou.edu wrote:

 I can see where you are getting at, but I would have to disagree.  First
 of all, when a comparison between two mis-shaped arrays occur, you get back
 a bone fide python boolean, not a numpy array of bools. So if any action
 was taken on the result of such a comparison assumed that the result was
 some sort of an array, it would fail (yes, this does make it a bit
 difficult to trace back the source of the problem, but not impossible).

 Second, no semantics are broken with this. Are the arrays equal or not?
 If they weren't broadcastible, then returning False for == and True for !=
 makes perfect sense to me. At least, that is my take on it.

 Cheers!
 Ben Root



 On Fri, Jul 12, 2013 at 8:38 AM, Sebastian Berg 
 sebast...@sipsolutions.net wrote:

 Hey,

 the array comparisons == and != never raise errors but instead simply
 return False for invalid comparisons.

 The main example are arrays of non-matching dimensions, and object
 arrays with invalid element-wise comparisons:

 In [1]: np.array([1,2,3]) == np.array([1,2])
 Out[1]: False

 In [2]: np.array([1, np.array([2, 3])], dtype=object) == [1, 2]
 Out[2]: False

 This seems wrong to me, and I am sure not just me. I doubt any large
 projects makes use of such comparisons and assume that most would prefer
 the shape mismatch to raise an error, so I would like to change it. But
 I am a bit unsure especially about smaller projects. So to keep the
 transition a bit safer could imagine implementing a FutureWarning for
 these cases (and that would at least notify new users that what they are
 doing doesn't seem like the right thing).

 So the question is: Is such a change safe enough, or is there some good
 reason for the current behavior that I am missing?

 Regards,

 Sebastian

 (There may be other issues with structured types that would continue
 returning False I think, because neither side knows how to compare)

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] empty_like for masked arrays

2013-07-15 Thread Gregorio Bastardo
Hi,

On Mon, Jun 10, 2013 at 3:47 PM, Nathaniel Smith n...@pobox.com wrote:
 Hi all,

 Is there anyone out there using numpy masked arrays, who has an
 opinion on how empty_like (and its friends ones_like, zeros_like)
 should handle the mask?

 Right now apparently if you call np.ma.empty_like on a masked array,
 you get a new masked array that shares the original array's mask, so
 modifying one modifies the other. That's almost certainly wrong. This
 PR:
   https://github.com/numpy/numpy/pull/3404
 makes it so instead the new array has values that are all set to
 empty/zero/one, and a mask which is set to match the input array's
 mask (so whenever something was masked in the original array, the
 empty/zero/one in that place is also masked). We don't know if this is
 the desired behaviour for these functions, though. Maybe it's more
 intuitive for the new array to match the original array in shape and
 dtype, but to always have an empty mask. Or maybe not. None of us
 really use np.ma, so if you do and have an opinion then please speak
 up...

I recently joined the mailing list, so the message might not reach the
original thread, sorry for that.

I use masked arrays extensively, and would vote for the first option,
as I use the *_like operations with the assumption that the resulting
array has the same mask as the original. I think it's more intuitive
than selecting between all masked or all unmasked behaviour. If it's
not too late, please consider my use case.

Thanks,
Gregorio
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread Charles R Harris
On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Sun, Jul 14, 2013 at 2:55 PM, Warren Weckesser 
 warren.weckes...@gmail.com wrote:

 On 7/14/13, Charles R Harris charlesr.har...@gmail.com wrote:
  Some corner cases in the mean, var, std.
 
  *Empty arrays*
 
  I think these cases should either raise an error or just return nan.
  Warnings seem ineffective to me as they are only issued once by default.
 
  In [3]: ones(0).mean()
 
 /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:61:
  RuntimeWarning: invalid value encountered in double_scalars
ret = ret / float(rcount)
  Out[3]: nan
 
  In [4]: ones(0).var()
 
 /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76:
  RuntimeWarning: invalid value encountered in true_divide
out=arrmean, casting='unsafe', subok=False)
 
 /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100:
  RuntimeWarning: invalid value encountered in double_scalars
ret = ret / float(rcount)
  Out[4]: nan
 
  In [5]: ones(0).std()
 
 /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76:
  RuntimeWarning: invalid value encountered in true_divide
out=arrmean, casting='unsafe', subok=False)
 
 /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100:
  RuntimeWarning: invalid value encountered in double_scalars
ret = ret / float(rcount)
  Out[5]: nan
 
  *ddof = number of elements*
 
  I think these should just raise errors. The results for ddof =
 #elements
  is happenstance, and certainly negative numbers should never be
 returned.
 
  In [6]: ones(2).var(ddof=2)
 
 /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100:
  RuntimeWarning: invalid value encountered in double_scalars
ret = ret / float(rcount)
  Out[6]: nan
 
  In [7]: ones(2).var(ddof=3)
  Out[7]: -0.0
  *
  nansum*
 
  Currently returns nan for empty arrays. I suspect it should return nan
 for
  slices that are all nan, but 0 for empty slices. That would make it
  consistent with sum in the empty case.
 


 For nansum, I would expect 0 even in the case of all nans.  The point
 of these functions is to simply ignore nans, correct?  So I would aim
 for this behaviour:  nanfunc(x) behaves the same as func(x[~isnan(x)])


 Agreed, although that changes current behavior. What about the other
 cases?


Looks like there isn't much interest in the topic, so I'll just go ahead
with the following choices:

Non-NaN case

1) Empty array - ValueError

The current behavior with stats is an accident, i.e., the nan arises from
0/0. I like to think that in this case the result is any number, rather
than not a number, so *the* value is simply not defined. So in this case
raise a ValueError for empty array.

2) ddof = n - ValueError

If the number of elements, n, is not zero and ddof = n, raise a ValueError
for the ddof value.

Nan case

1) Empty array - Value Error
2) Empty slice - NaN
3) For slice ddof = n - Nan

 Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Allow == and != to raise errors

2013-07-15 Thread Nathaniel Smith
On Mon, Jul 15, 2013 at 2:09 PM, bruno Piguet bruno.pig...@gmail.com wrote:
 Python itself doesn't raise an exception in such cases :

 (3,4) != (2, 3, 4)
 True
 (3,4) == (2, 3, 4)
 False

 Should numpy behave differently ?

The numpy equivalent to Python's scalar == is called array_equal,
and that does indeed behave the same:

In [5]: np.array_equal([3, 4], [2, 3, 4])
Out[5]: False

But in numpy, the name == is shorthand for the ufunc np.equal, which
raises an error:

In [8]: np.equal([3, 4], [2, 3, 4])
ValueError: operands could not be broadcast together with shapes (2) (3)

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Allow == and != to raise errors

2013-07-15 Thread Sebastian Berg
On Mon, 2013-07-15 at 15:09 +0200, bruno Piguet wrote:
 Python itself doesn't raise an exception in such cases :
 
  (3,4) != (2, 3, 4)
 True
  (3,4) == (2, 3, 4)
 False

 
 Should numpy behave differently ?
 
Yes, because Python tests whether the tuple is different, not whether
the elements are:

 (3, 4) == (3, 4)
True
 np.array([3, 4]) == np.array([3, 4])
array([ True,  True], dtype=bool)

So doing the test like python *changes* the behaviour.

- Sebastian
 
 Bruno.
 
 
 
 2013/7/12 Frédéric Bastien no...@nouiz.org
 I also don't like that idea, but I'm not able to come to a
 good reasoning like Benjamin.
 
 
 I don't see advantage to this change and the reason isn't good
 enough to justify breaking the interface I think.
 
 
 But I don't think we rely on this, so if the change goes in,
 it probably won't break stuff or they will be easily seen and
 repared.
 
 
 Fred
 
 
 On Fri, Jul 12, 2013 at 9:13 AM, Benjamin Root
 ben.r...@ou.edu wrote:
 I can see where you are getting at, but I would have
 to disagree.  First of all, when a comparison between
 two mis-shaped arrays occur, you get back a bone fide
 python boolean, not a numpy array of bools. So if any
 action was taken on the result of such a comparison
 assumed that the result was some sort of an array, it
 would fail (yes, this does make it a bit difficult to
 trace back the source of the problem, but not
 impossible).
 
 
 Second, no semantics are broken with this. Are the
 arrays equal or not? If they weren't broadcastible,
 then returning False for == and True for != makes
 perfect sense to me. At least, that is my take on it.
 
 
 Cheers!
 
 Ben Root
 
 
 
 
 On Fri, Jul 12, 2013 at 8:38 AM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
 Hey,
 
 the array comparisons == and != never raise
 errors but instead simply
 return False for invalid comparisons.
 
 The main example are arrays of non-matching
 dimensions, and object
 arrays with invalid element-wise comparisons:
 
 In [1]: np.array([1,2,3]) == np.array([1,2])
 Out[1]: False
 
 In [2]: np.array([1, np.array([2, 3])],
 dtype=object) == [1, 2]
 Out[2]: False
 
 This seems wrong to me, and I am sure not just
 me. I doubt any large
 projects makes use of such comparisons and
 assume that most would prefer
 the shape mismatch to raise an error, so I
 would like to change it. But
 I am a bit unsure especially about smaller
 projects. So to keep the
 transition a bit safer could imagine
 implementing a FutureWarning for
 these cases (and that would at least notify
 new users that what they are
 doing doesn't seem like the right thing).
 
 So the question is: Is such a change safe
 enough, or is there some good
 reason for the current behavior that I am
 missing?
 
 Regards,
 
 Sebastian
 
 (There may be other issues with structured
 types that would continue
 returning False I think, because neither side
 knows how to compare)
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 ___
 NumPy-Discussion mailing list
 

Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread Benjamin Root
This is going to need to be heavily documented with doctests. Also, just to
clarify, are we talking about a ValueError for doing a nansum on an empty
array as well, or will that now return a zero?

Ben Root


On Mon, Jul 15, 2013 at 9:52 AM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:



 On Sun, Jul 14, 2013 at 2:55 PM, Warren Weckesser 
 warren.weckes...@gmail.com wrote:

 On 7/14/13, Charles R Harris charlesr.har...@gmail.com wrote:
  Some corner cases in the mean, var, std.
 
  *Empty arrays*
 
  I think these cases should either raise an error or just return nan.
  Warnings seem ineffective to me as they are only issued once by
 default.
 
  In [3]: ones(0).mean()
 
 /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:61:
  RuntimeWarning: invalid value encountered in double_scalars
ret = ret / float(rcount)
  Out[3]: nan
 
  In [4]: ones(0).var()
 
 /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76:
  RuntimeWarning: invalid value encountered in true_divide
out=arrmean, casting='unsafe', subok=False)
 
 /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100:
  RuntimeWarning: invalid value encountered in double_scalars
ret = ret / float(rcount)
  Out[4]: nan
 
  In [5]: ones(0).std()
 
 /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76:
  RuntimeWarning: invalid value encountered in true_divide
out=arrmean, casting='unsafe', subok=False)
 
 /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100:
  RuntimeWarning: invalid value encountered in double_scalars
ret = ret / float(rcount)
  Out[5]: nan
 
  *ddof = number of elements*
 
  I think these should just raise errors. The results for ddof =
 #elements
  is happenstance, and certainly negative numbers should never be
 returned.
 
  In [6]: ones(2).var(ddof=2)
 
 /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100:
  RuntimeWarning: invalid value encountered in double_scalars
ret = ret / float(rcount)
  Out[6]: nan
 
  In [7]: ones(2).var(ddof=3)
  Out[7]: -0.0
  *
  nansum*
 
  Currently returns nan for empty arrays. I suspect it should return nan
 for
  slices that are all nan, but 0 for empty slices. That would make it
  consistent with sum in the empty case.
 


 For nansum, I would expect 0 even in the case of all nans.  The point
 of these functions is to simply ignore nans, correct?  So I would aim
 for this behaviour:  nanfunc(x) behaves the same as func(x[~isnan(x)])


 Agreed, although that changes current behavior. What about the other
 cases?


 Looks like there isn't much interest in the topic, so I'll just go ahead
 with the following choices:

 Non-NaN case

 1) Empty array - ValueError

 The current behavior with stats is an accident, i.e., the nan arises from
 0/0. I like to think that in this case the result is any number, rather
 than not a number, so *the* value is simply not defined. So in this case
 raise a ValueError for empty array.

 2) ddof = n - ValueError

 If the number of elements, n, is not zero and ddof = n, raise a
 ValueError for the ddof value.

 Nan case

 1) Empty array - Value Error
 2) Empty slice - NaN
 3) For slice ddof = n - Nan

  Chuck


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread Sebastian Berg
On Mon, 2013-07-15 at 07:52 -0600, Charles R Harris wrote:
 
 
 On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 

snip

 
 For nansum, I would expect 0 even in the case of all
 nans.  The point
 of these functions is to simply ignore nans, correct?
  So I would aim
 for this behaviour:  nanfunc(x) behaves the same as
 func(x[~isnan(x)])
 
 
 Agreed, although that changes current behavior. What about the
 other cases? 
 
 
 
 Looks like there isn't much interest in the topic, so I'll just go
 ahead with the following choices:
 
 Non-NaN case
 
 1) Empty array - ValueError
 
 The current behavior with stats is an accident, i.e., the nan arises
 from 0/0. I like to think that in this case the result is any number,
 rather than not a number, so *the* value is simply not defined. So in
 this case raise a ValueError for empty array.
 
To be honest, I don't mind the current behaviour much sum([]) = 0,
len([]) = 0, so it is in a way well defined. At least I am not sure if I
would prefer always an error. I am a bit worried that just changing it
might break code out there, such as plotting code where it makes
perfectly sense to plot a NaN (i.e. nothing), but if that is the case it
would probably be visible fast.

 2) ddof = n - ValueError
 
 If the number of elements, n, is not zero and ddof = n, raise a
 ValueError for the ddof value.
 
Makes sense to me, especially for ddof  n. Just returning nan in all
cases for backward compatibility would be fine with me too.

 Nan case
 
 1) Empty array - Value Error
 2) Empty slice - NaN
 3) For slice ddof = n - Nan
 
Personally I would somewhat prefer if 1) and 2) would at least default
to the same thing. But I don't use the nanfuncs anyway. I was wondering
about adding the option for the user to pick what the fill is (and i.e.
if it is None (maybe default) - ValueError). We could also allow this
for normal reductions without an identity, but I am not sure if it is
useful there.

- Sebastian

  Chuck
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread Charles R Harris
On Mon, Jul 15, 2013 at 8:25 AM, Benjamin Root ben.r...@ou.edu wrote:

 This is going to need to be heavily documented with doctests. Also, just
 to clarify, are we talking about a ValueError for doing a nansum on an
 empty array as well, or will that now return a zero?


I was going to leave nansum as is, as it seems that the result was by
choice rather than by accident.

Tests, not doctests. I detest doctests ;) Examples, OTOH...

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread Charles R Harris
On Mon, Jul 15, 2013 at 8:34 AM, Sebastian Berg
sebast...@sipsolutions.netwrote:

 On Mon, 2013-07-15 at 07:52 -0600, Charles R Harris wrote:
 
 
  On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris
  charlesr.har...@gmail.com wrote:
 

 snip

 
  For nansum, I would expect 0 even in the case of all
  nans.  The point
  of these functions is to simply ignore nans, correct?
   So I would aim
  for this behaviour:  nanfunc(x) behaves the same as
  func(x[~isnan(x)])
 
 
  Agreed, although that changes current behavior. What about the
  other cases?
 
 
 
  Looks like there isn't much interest in the topic, so I'll just go
  ahead with the following choices:
 
  Non-NaN case
 
  1) Empty array - ValueError
 
  The current behavior with stats is an accident, i.e., the nan arises
  from 0/0. I like to think that in this case the result is any number,
  rather than not a number, so *the* value is simply not defined. So in
  this case raise a ValueError for empty array.
 
 To be honest, I don't mind the current behaviour much sum([]) = 0,
 len([]) = 0, so it is in a way well defined. At least I am not sure if I
 would prefer always an error. I am a bit worried that just changing it
 might break code out there, such as plotting code where it makes
 perfectly sense to plot a NaN (i.e. nothing), but if that is the case it
 would probably be visible fast.


I'm talking about mean, var, and std as statistics, sum isn't part of that.
If there is agreement that nansum of empty arrays/columns should be zero I
will do that. Note the sums of empty arrays may or may not be empty.

In [1]: ones((0, 3)).sum(axis=0)
Out[1]: array([ 0.,  0.,  0.])

In [2]: ones((3, 0)).sum(axis=0)
Out[2]: array([], dtype=float64)

Which, sort of, makes sense.



  2) ddof = n - ValueError
 
  If the number of elements, n, is not zero and ddof = n, raise a
  ValueError for the ddof value.
 
 Makes sense to me, especially for ddof  n. Just returning nan in all
 cases for backward compatibility would be fine with me too.

  Nan case
 
  1) Empty array - Value Error
  2) Empty slice - NaN
  3) For slice ddof = n - Nan
 
 Personally I would somewhat prefer if 1) and 2) would at least default
 to the same thing. But I don't use the nanfuncs anyway. I was wondering
 about adding the option for the user to pick what the fill is (and i.e.
 if it is None (maybe default) - ValueError). We could also allow this
 for normal reductions without an identity, but I am not sure if it is
 useful there.


Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Allow == and != to raise errors

2013-07-15 Thread Frédéric Bastien
Just a question, should == behave like a ufunc or like python == for tuple?

I think that all ndarray comparision (==, !=, =, ...) should behave the
same. If they don't (like it was said), making them consistent is good.
What is the minimal change to have them behave the same? From my
understanding, it is your proposal to change == and != to behave like real
ufunc. But I'm not sure if the minimal change is the best, for new user,
what they will expect more? The ufunc of the python behavior?

Anyway, I see the advantage to simplify the interface to something more
consistent.

Anyway, if we make all comparison behave like ufunc, there is array_equal
as said to have the python behavior of ==, is it useful to have equivalent
function the other comparison? Do they already exist.

thanks

Fred


On Mon, Jul 15, 2013 at 10:20 AM, Nathaniel Smith n...@pobox.com wrote:

 On Mon, Jul 15, 2013 at 2:09 PM, bruno Piguet bruno.pig...@gmail.com
 wrote:
  Python itself doesn't raise an exception in such cases :
 
  (3,4) != (2, 3, 4)
  True
  (3,4) == (2, 3, 4)
  False
 
  Should numpy behave differently ?

 The numpy equivalent to Python's scalar == is called array_equal,
 and that does indeed behave the same:

 In [5]: np.array_equal([3, 4], [2, 3, 4])
 Out[5]: False

 But in numpy, the name == is shorthand for the ufunc np.equal, which
 raises an error:

 In [8]: np.equal([3, 4], [2, 3, 4])
 ValueError: operands could not be broadcast together with shapes (2) (3)

 -n
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread Charles R Harris
On Mon, Jul 15, 2013 at 8:34 AM, Sebastian Berg
sebast...@sipsolutions.netwrote:

 On Mon, 2013-07-15 at 07:52 -0600, Charles R Harris wrote:
 
 
  On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris
  charlesr.har...@gmail.com wrote:
 

 snip

 
  For nansum, I would expect 0 even in the case of all
  nans.  The point
  of these functions is to simply ignore nans, correct?
   So I would aim
  for this behaviour:  nanfunc(x) behaves the same as
  func(x[~isnan(x)])
 
 
  Agreed, although that changes current behavior. What about the
  other cases?
 
 
 
  Looks like there isn't much interest in the topic, so I'll just go
  ahead with the following choices:
 
  Non-NaN case
 
  1) Empty array - ValueError
 
  The current behavior with stats is an accident, i.e., the nan arises
  from 0/0. I like to think that in this case the result is any number,
  rather than not a number, so *the* value is simply not defined. So in
  this case raise a ValueError for empty array.
 
 To be honest, I don't mind the current behaviour much sum([]) = 0,
 len([]) = 0, so it is in a way well defined. At least I am not sure if I
 would prefer always an error. I am a bit worried that just changing it
 might break code out there, such as plotting code where it makes
 perfectly sense to plot a NaN (i.e. nothing), but if that is the case it
 would probably be visible fast.

  2) ddof = n - ValueError
 
  If the number of elements, n, is not zero and ddof = n, raise a
  ValueError for the ddof value.
 
 Makes sense to me, especially for ddof  n. Just returning nan in all
 cases for backward compatibility would be fine with me too.


Currently if ddof  n it returns a negative number for variance, the NaN
only comes when ddof == 0 and n == 0, leading to 0/0 (float is NaN, integer
is zero division).



  Nan case
 
  1) Empty array - Value Error
  2) Empty slice - NaN
  3) For slice ddof = n - Nan
 
 Personally I would somewhat prefer if 1) and 2) would at least default
 to the same thing. But I don't use the nanfuncs anyway. I was wondering
 about adding the option for the user to pick what the fill is (and i.e.
 if it is None (maybe default) - ValueError). We could also allow this
 for normal reductions without an identity, but I am not sure if it is
 useful there.


In the NaN case some slices may be empty, others not. My reasoning is that
that is going to be data dependent, not operator error, but if the array is
empty the writer of the code should deal with that.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Allow == and != to raise errors

2013-07-15 Thread bruno Piguet
Thank-you for your explanations.

So, if  the operator  == applied to np.arrays is a shorthand for the
ufunc np.equal, it should definitly behave exactly as np.equal(), and raise
an error.

One side question about style : In case you would like to protect a x ==
y test by a try/except clause, wouldn't it feel more natural to write 
np.equal(x, y) ?


Bruno.


2013/7/15 Nathaniel Smith n...@pobox.com

 On Mon, Jul 15, 2013 at 2:09 PM, bruno Piguet bruno.pig...@gmail.com
 wrote:
  Python itself doesn't raise an exception in such cases :
 
  (3,4) != (2, 3, 4)
  True
  (3,4) == (2, 3, 4)
  False
 
  Should numpy behave differently ?

 The numpy equivalent to Python's scalar == is called array_equal,
 and that does indeed behave the same:

 In [5]: np.array_equal([3, 4], [2, 3, 4])
 Out[5]: False

 But in numpy, the name == is shorthand for the ufunc np.equal, which
 raises an error:

 In [8]: np.equal([3, 4], [2, 3, 4])
 ValueError: operands could not be broadcast together with shapes (2) (3)

 -n
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PIL and NumPy

2013-07-15 Thread Chris Barker - NOAA Federal
On Jul 12, 2013, at 8:51 PM, Brady McCary brady.mcc...@gmail.com wrote:


 something to do with an alpha channel being present.

I'd check and see how PIL is storing the alpha channel. If it's RGBA,
then I'd expect it to work.

But I'd PIL is storing the alpha channel as a separate band, then I'm
not surprised you have an issue.

Can you either drop the alpha or convert to RGBA?

There is also a package called something line imageArray that loads
and saves image formats directly to/from numpy arrays-maybe that would
be helpful.

CHB


 When I remove the
 alpha channel, things appear to work as I expect. Any discussion on
 the matter?

 Brady

 On Fri, Jul 12, 2013 at 10:00 PM, Brady McCary brady.mcc...@gmail.com wrote:
 NumPy Folks,

 I want to load images with PIL and then operate on them with NumPy.
 According to the PIL and NumPy documentation, I would expect the
 following to work, but it is not.



 Python 2.7.4 (default, Apr 19 2013, 18:28:01)
 [GCC 4.7.3] on linux2
 Type help, copyright, credits or license for more information.
 import numpy
 numpy.version.version

 import Image
 Image.VERSION
 '1.1.7'

 im = Image.open('big-0.png')
 im.size
 (2550, 3300)

 ar = numpy.asarray(im)
 ar.size
 1
 ar.shape
 ()
 ar
 array(PIL.PngImagePlugin.PngImageFile image mode=LA size=2550x3300 at
 0x1E5BA70, dtype=object)



 By not working I mean that I would have expected the data to be
 loaded/available in ar. PIL and NumPy/SciPy seem to be working fine
 independently of each other. Any guidance?

 Brady
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Allow == and != to raise errors

2013-07-15 Thread bruno Piguet
2013/7/15 Frédéric Bastien no...@nouiz.org

 Just a question, should == behave like a ufunc or like python == for tuple?


That's what I was also wondering.

I see the advantage of consistency for newcomers.
I'm not experienced enough to see if this is a problem for numerical
practitionners Maybe they wouldn't even imagine that == applied to arrays
could do anything else than element-wise comparison ?

Explicit is better than implicit : to me,  np.equal(x, y) is more
explicit than x == y.
But Beautiful is better than ugly. Is np.equal(x, y) ugly ?

Bruno.



 I think that all ndarray comparision (==, !=, =, ...) should behave the
 same. If they don't (like it was said), making them consistent is good.
 What is the minimal change to have them behave the same? From my
 understanding, it is your proposal to change == and != to behave like real
 ufunc. But I'm not sure if the minimal change is the best, for new user,
 what they will expect more? The ufunc of the python behavior?

 Anyway, I see the advantage to simplify the interface to something more
 consistent.

 Anyway, if we make all comparison behave like ufunc, there is array_equal
 as said to have the python behavior of ==, is it useful to have equivalent
 function the other comparison? Do they already exist.

 thanks

 Fred


 On Mon, Jul 15, 2013 at 10:20 AM, Nathaniel Smith n...@pobox.com wrote:

 On Mon, Jul 15, 2013 at 2:09 PM, bruno Piguet bruno.pig...@gmail.com
 wrote:
  Python itself doesn't raise an exception in such cases :
 
  (3,4) != (2, 3, 4)
  True
  (3,4) == (2, 3, 4)
  False
 
  Should numpy behave differently ?

 The numpy equivalent to Python's scalar == is called array_equal,
 and that does indeed behave the same:

 In [5]: np.array_equal([3, 4], [2, 3, 4])
 Out[5]: False

 But in numpy, the name == is shorthand for the ufunc np.equal, which
 raises an error:

 In [8]: np.equal([3, 4], [2, 3, 4])
 ValueError: operands could not be broadcast together with shapes (2) (3)

 -n
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] read-only or immutable masked array

2013-07-15 Thread Pierre Gerard-Marchant

On Jul 15, 2013, at 14:40 , Gregorio Bastardo gregorio.basta...@gmail.com 
wrote:

 Hi Pierre,
 
 Note as well that hardening the mask only prevents unmasking: you can still 
 grow the mask, which may not be what you want. Use 
 `x.mask.flags.writeable=False` to make the mask really read-only.
 
 I ran into an unmasking problem with the suggested approach:
 
 np.version.version
 '1.7.0'
 x = np.ma.masked_array(xrange(4), [0,1,0,1])
 x
 masked_array(data = [0 -- 2 --],
 mask = [False  True False  True],
   fill_value = 99)
 x.flags.writeable = False
 x.mask.flags.writeable = False
 x.mask[1] = 0 # ok
 Traceback (most recent call last):
  ...
 ValueError: assignment destination is read-only
 x[1] = 0 # ok
 Traceback (most recent call last):
  ...
 ValueError: assignment destination is read-only
 x.mask[1] = 0 # ??
 x
 masked_array(data = [0 1 2 --],
 mask = [False False False  True],
   fill_value = 99)

Ouch…
Quick workaround:  use `x.harden_mask()` *then* `x.mask.flags.writeable=False`

[Longer explanation]
 I noticed that sharedmask attribute changes (from True to False)
 after x[1] = 0.

Indeed, indeed… When setting items, the mask is unshared to limit some issues 
(like propagation to the other masked_arrays sharing the mask). Unsharing the 
mask involves a copy, which unfortunately doesn't copy the flags. In other 
terms, when you try `x[1]=0`, the mask becomes rewritable. That hurts…
But! This call to `unshare_mask` is performed only when the mask is 'soft' 
hence the quick workaround…

Note to self (or whomever will fix the issue before I can do it):
* We could make sure that copying a mask copies some of its flags to (like the 
`writeable` one, which other ones?)
* The call to `unshare_mask` is made *before* we try to call `__setitem__` on 
the `_data` part: that's silly, if we called `__setitem__(_data,index,dval)` 
before, the `ValueError: assignment destination is read-only` would be raised 
before the mask could get unshared… TLD;DR: move L3073 of np.ma.core to L3068
* There should be some simpler ways to make a masked_array read-only, this 
little dance is rapidly tiring.





 Also, some of the ma operations result mask identity
 of the new ma, which causes ValueError when the new ma mask is
 modified:
 
 x = np.ma.masked_array(xrange(4), [0,1,0,1])
 x.flags.writeable = False
 x.mask.flags.writeable = False
 x1 = x  0
 x1.mask is x.mask # ok
 False
 x2 = x != 0
 x2.mask is x.mask # ??
 True
 x2.mask[1] = 0
 Traceback (most recent call last):
  ...
 ValueError: assignment destination is read-only
 
 which is a bit confusing.

Ouch again. 
[TL;DR] No workaround, sorry
[Long version]
The inconsistency comes from the fact that '!=' or '==' call the `__ne__` or 
`__eq__` methods while other comparison operators call their own function. In 
the first case, because we're comparing with a non-masked scalar, no copy of 
the mask is made; in the second case, a copy is systematically made. As pointed 
out earlier, copies of a mask don't preserve its flags…
[Note to self]
* Define a factory for __lt__/__le__/__gt__/__ge__ based on __eq__ : 
MaskedArray.__eq__ and __ne__ already have almost the same code.. (but what 
about filling? Is it an issue?)



 And I experienced that *_like operations
 give mask identity too:
 
 y = np.ones_like(x)
 y.mask is x.mask
 True

This may change in the future, depending on a yet-to-be-achieved consensus on  
the definition of 'least-surprising behaviour'. Right now, the *-like functions 
return an array that shares the mask with the input, as you've noticed. Some 
people complained about it, what's your take on that?

 I might be missing something but could you clarify these issues?

You were not missing anything, np.ma isn't the most straightforward module: 
plenty of corner cases, and the implementation is pretty naive at times (but 
hey, it works). My only advice is to never lose hope.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread Charles R Harris
On Mon, Jul 15, 2013 at 8:58 AM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Mon, Jul 15, 2013 at 8:34 AM, Sebastian Berg 
 sebast...@sipsolutions.net wrote:

 On Mon, 2013-07-15 at 07:52 -0600, Charles R Harris wrote:
 
 
  On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris
  charlesr.har...@gmail.com wrote:
 

 snip

 
  For nansum, I would expect 0 even in the case of all
  nans.  The point
  of these functions is to simply ignore nans, correct?
   So I would aim
  for this behaviour:  nanfunc(x) behaves the same as
  func(x[~isnan(x)])
 
 
  Agreed, although that changes current behavior. What about the
  other cases?
 
 
 
  Looks like there isn't much interest in the topic, so I'll just go
  ahead with the following choices:
 
  Non-NaN case
 
  1) Empty array - ValueError
 
  The current behavior with stats is an accident, i.e., the nan arises
  from 0/0. I like to think that in this case the result is any number,
  rather than not a number, so *the* value is simply not defined. So in
  this case raise a ValueError for empty array.
 
 To be honest, I don't mind the current behaviour much sum([]) = 0,
 len([]) = 0, so it is in a way well defined. At least I am not sure if I
 would prefer always an error. I am a bit worried that just changing it
 might break code out there, such as plotting code where it makes
 perfectly sense to plot a NaN (i.e. nothing), but if that is the case it
 would probably be visible fast.

  2) ddof = n - ValueError
 
  If the number of elements, n, is not zero and ddof = n, raise a
  ValueError for the ddof value.
 
 Makes sense to me, especially for ddof  n. Just returning nan in all
 cases for backward compatibility would be fine with me too.


 Currently if ddof  n it returns a negative number for variance, the NaN
 only comes when ddof == 0 and n == 0, leading to 0/0 (float is NaN, integer
 is zero division).



  Nan case
 
  1) Empty array - Value Error
  2) Empty slice - NaN
  3) For slice ddof = n - Nan
 
 Personally I would somewhat prefer if 1) and 2) would at least default
 to the same thing. But I don't use the nanfuncs anyway. I was wondering
 about adding the option for the user to pick what the fill is (and i.e.
 if it is None (maybe default) - ValueError). We could also allow this
 for normal reductions without an identity, but I am not sure if it is
 useful there.


 In the NaN case some slices may be empty, others not. My reasoning is that
 that is going to be data dependent, not operator error, but if the array is
 empty the writer of the code should deal with that.


In the case of the nanvar, nanstd, it might make more sense to handle ddof
as

1) if ddof is = axis size, raise ValueError
2) if ddof is = number of values after removing NaNs, return NaN

The first would be consistent with the non-nan case, the second accounts
for the variable nature of data containing NaNs.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread Benjamin Root
On Jul 15, 2013 11:47 AM, Charles R Harris charlesr.har...@gmail.com
wrote:



 On Mon, Jul 15, 2013 at 8:58 AM, Charles R Harris 
 charlesr.har...@gmail.com wrote:



 On Mon, Jul 15, 2013 at 8:34 AM, Sebastian Berg 
 sebast...@sipsolutions.net wrote:

 On Mon, 2013-07-15 at 07:52 -0600, Charles R Harris wrote:
 
 
  On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris
  charlesr.har...@gmail.com wrote:
 

 snip

 
  For nansum, I would expect 0 even in the case of all
  nans.  The point
  of these functions is to simply ignore nans, correct?
   So I would aim
  for this behaviour:  nanfunc(x) behaves the same as
  func(x[~isnan(x)])
 
 
  Agreed, although that changes current behavior. What about the
  other cases?
 
 
 
  Looks like there isn't much interest in the topic, so I'll just go
  ahead with the following choices:
 
  Non-NaN case
 
  1) Empty array - ValueError
 
  The current behavior with stats is an accident, i.e., the nan arises
  from 0/0. I like to think that in this case the result is any number,
  rather than not a number, so *the* value is simply not defined. So in
  this case raise a ValueError for empty array.
 
 To be honest, I don't mind the current behaviour much sum([]) = 0,
 len([]) = 0, so it is in a way well defined. At least I am not sure if I
 would prefer always an error. I am a bit worried that just changing it
 might break code out there, such as plotting code where it makes
 perfectly sense to plot a NaN (i.e. nothing), but if that is the case it
 would probably be visible fast.

  2) ddof = n - ValueError
 
  If the number of elements, n, is not zero and ddof = n, raise a
  ValueError for the ddof value.
 
 Makes sense to me, especially for ddof  n. Just returning nan in all
 cases for backward compatibility would be fine with me too.


 Currently if ddof  n it returns a negative number for variance, the NaN
 only comes when ddof == 0 and n == 0, leading to 0/0 (float is NaN, integer
 is zero division).



  Nan case
 
  1) Empty array - Value Error
  2) Empty slice - NaN
  3) For slice ddof = n - Nan
 
 Personally I would somewhat prefer if 1) and 2) would at least default
 to the same thing. But I don't use the nanfuncs anyway. I was wondering
 about adding the option for the user to pick what the fill is (and i.e.
 if it is None (maybe default) - ValueError). We could also allow this
 for normal reductions without an identity, but I am not sure if it is
 useful there.


 In the NaN case some slices may be empty, others not. My reasoning is
 that that is going to be data dependent, not operator error, but if the
 array is empty the writer of the code should deal with that.


 In the case of the nanvar, nanstd, it might make more sense to handle ddof
 as

 1) if ddof is = axis size, raise ValueError
 2) if ddof is = number of values after removing NaNs, return NaN

 The first would be consistent with the non-nan case, the second accounts
 for the variable nature of data containing NaNs.

 Chuck



I think this is a good idea in that it naturally follows well with the
conventions of what to do with empty arrays / empty slices with nanmean,
etc. Note, however, I am not a very big fan of the idea of having two
different behaviors for what I see as semantically the same thing.

But, my objections are not strong enough to veto it, and I do think this
proposal is well thought-out.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread Sebastian Berg
On Mon, 2013-07-15 at 08:47 -0600, Charles R Harris wrote:
 
 
 On Mon, Jul 15, 2013 at 8:34 AM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
 On Mon, 2013-07-15 at 07:52 -0600, Charles R Harris wrote:
 
 
  On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris
  charlesr.har...@gmail.com wrote:
 
 
 
 snip
 
 
  For nansum, I would expect 0 even in the
 case of all
  nans.  The point
  of these functions is to simply ignore nans,
 correct?
   So I would aim
  for this behaviour:  nanfunc(x) behaves the
 same as
  func(x[~isnan(x)])
 
 
  Agreed, although that changes current behavior. What
 about the
  other cases?
 
 
 
  Looks like there isn't much interest in the topic, so I'll
 just go
  ahead with the following choices:
 
  Non-NaN case
 
  1) Empty array - ValueError
 
  The current behavior with stats is an accident, i.e., the
 nan arises
  from 0/0. I like to think that in this case the result is
 any number,
  rather than not a number, so *the* value is simply not
 defined. So in
  this case raise a ValueError for empty array.
 
 
 To be honest, I don't mind the current behaviour much sum([])
 = 0,
 len([]) = 0, so it is in a way well defined. At least I am not
 sure if I
 would prefer always an error. I am a bit worried that just
 changing it
 might break code out there, such as plotting code where it
 makes
 perfectly sense to plot a NaN (i.e. nothing), but if that is
 the case it
 would probably be visible fast.
 
 I'm talking about mean, var, and std as statistics, sum isn't part of
 that. If there is agreement that nansum of empty arrays/columns should
 be zero I will do that. Note the sums of empty arrays may or may not
 be empty.
 
 In [1]: ones((0, 3)).sum(axis=0)
 Out[1]: array([ 0.,  0.,  0.])
 
 In [2]: ones((3, 0)).sum(axis=0)
 Out[2]: array([], dtype=float64)
 
 Which, sort of, makes sense.
  
 
I think we can agree that the behaviour for reductions with an identity
should default to returning the identity, including for the nanfuncs,
i.e. sum([]) is 0, product([]) is 1...

Since mean = sum/length is a sensible definition, having 0/0 as a result
doesn't seem to bad to me to be honest, it might be accidental but it is
not a special case in the code ;). Though I don't mind an error as long
as it doesn't break matplotlib or so.

I agree about the nanfuncs raising an error would probably be more of a
problem then for a usual ufunc, but still a bit hesitant about saying
that it is ok too. I could imagine adding a very general identity
argument (though I would not call it identity, because it is not the
same as `np.add.identity`, just used in a place where that would be used
otherwise):

np.add.reduce([], identity=123) - [123]
np.add.reduce([1], identity=123) - [1]
np.nanmean([np.nan], identity=None) - Error
np.nanmean([np.nan], identity=np.nan) - np.nan

It doesn't really make sense, but:
np.subtract.reduce([]) - Error, since np.substract.identity is None
np.subtract.reduce([], identity=0) - 0, suppressing the error.

I am not sure if I am convinced myself, but especially for the nanfuncs
it could maybe provide a way to circumvent the problem somewhat.
Including functions such as np.nanargmin, whose result type does not
even support NaN. Plus it gives an argument allowing for warnings about
changing behaviour.

- Sebastian

 
  2) ddof = n - ValueError
 
  If the number of elements, n, is not zero and ddof = n,
 raise a
  ValueError for the ddof value.
 
 
 Makes sense to me, especially for ddof  n. Just returning nan
 in all
 cases for backward compatibility would be fine with me too.
 
  Nan case
 
  1) Empty array - Value Error
  2) Empty slice - NaN
  3) For slice ddof = n - Nan
 
 
 Personally I would somewhat prefer if 1) and 2) would at least
 default
 to the same thing. But I don't use the nanfuncs anyway. I was
 wondering
 about adding the option for the user to pick what the fill is
 (and i.e.
 if it is None (maybe default) - ValueError). We could also
 allow this
 for normal reductions without an identity, but I am not sure
 if it is
 useful there.
 
 
 Chuck 
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 

Re: [Numpy-discussion] Allow == and != to raise errors

2013-07-15 Thread Sebastian Berg
On Mon, 2013-07-15 at 17:12 +0200, bruno Piguet wrote:
 
 
 
 2013/7/15 Frédéric Bastien no...@nouiz.org
 Just a question, should == behave like a ufunc or like python
 == for tuple?
 
 
 
 That's what I was also wondering.

I am not sure I understand the question. Of course == should be
(mostly?) identical to np.equal. Things like
arr[arr == 0] = -1
etc., etc., are a common design pattern. Operations on arrays are
element-wise by default, falling back to the python tuple/container
behaviour is a special case and I do not see a good reason for it,
except possibly backward compatibility.

Personally I doubt anyone who seriously uses numpy, uses the
np.array([1, 2, 3]) == np.array([1,2]) - False
behaviour, and it seems a bit like a trap to me, because suddenly you
get:
np.array([1, 2, 3]) == np.array([1]) - np.array([True, False, False])

(Though in combination with np.all, it can make sense and is then
identical to np.array_equiv/np.array_equal)

- Sebastian

 I see the advantage of consistency for newcomers.
 I'm not experienced enough to see if this is a problem for numerical
 practitionners Maybe they wouldn't even imagine that == applied to
 arrays could do anything else than element-wise comparison ? 
 
 Explicit is better than implicit : to me,  np.equal(x, y) is more
 explicit than x == y.
 
 But Beautiful is better than ugly. Is np.equal(x, y) ugly ? 
 
 
 Bruno.
 
 
 
  
 
 I think that all ndarray comparision (==, !=, =, ...) should
 behave the same. If they don't (like it was said), making them
 consistent is good. What is the minimal change to have them
 behave the same? From my understanding, it is your proposal to
 change == and != to behave like real ufunc. But I'm not sure
 if the minimal change is the best, for new user, what they
 will expect more? The ufunc of the python behavior?
 
 
 Anyway, I see the advantage to simplify the interface to
 something more consistent.
 
 
 Anyway, if we make all comparison behave like ufunc, there is
 array_equal as said to have the python behavior of ==, is it
 useful to have equivalent function the other comparison? Do
 they already exist.
 
 
 thanks
 
 
 
 Fred
 
 
 On Mon, Jul 15, 2013 at 10:20 AM, Nathaniel Smith
 n...@pobox.com wrote:
 On Mon, Jul 15, 2013 at 2:09 PM, bruno Piguet
 bruno.pig...@gmail.com wrote:
  Python itself doesn't raise an exception in such
 cases :
 
  (3,4) != (2, 3, 4)
  True
  (3,4) == (2, 3, 4)
  False
 
  Should numpy behave differently ?
 
 
 The numpy equivalent to Python's scalar == is called
 array_equal,
 and that does indeed behave the same:
 
 In [5]: np.array_equal([3, 4], [2, 3, 4])
 Out[5]: False
 
 But in numpy, the name == is shorthand for the ufunc
 np.equal, which
 raises an error:
 
 In [8]: np.equal([3, 4], [2, 3, 4])
 ValueError: operands could not be broadcast together
 with shapes (2) (3)
 
 -n
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PIL and NumPy

2013-07-15 Thread Stéfan van der Walt
Dear Brady

On Fri, 12 Jul 2013 22:00:08 -0500, Brady McCary wrote:

 I want to load images with PIL and then operate on them with NumPy.
 According to the PIL and NumPy documentation, I would expect the
 following to work, but it is not.

Reading images as PIL is a little bit trickier than one would hope.  You can
find an example of how to do it (taken scikit-image) here:

https://github.com/scikit-image/scikit-image/blob/master/skimage/io/_plugins/pil_plugin.py#L15

Stéfan

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] read-only or immutable masked array

2013-07-15 Thread Gregorio Bastardo
 Ouch…
 Quick workaround:  use `x.harden_mask()` *then* `x.mask.flags.writeable=False`

Thanks for the update and the detailed explanation. I'll try this trick.

 This may change in the future, depending on a yet-to-be-achieved consensus on 
  the definition of 'least-surprising behaviour'. Right now, the *-like 
 functions return an array that shares the mask with the input, as you've 
 noticed. Some people complained about it, what's your take on that?

I already took part in the survey (possibly out of thread):
http://mail.scipy.org/pipermail/numpy-discussion/2013-July/067136.html

 You were not missing anything, np.ma isn't the most straightforward module: 
 plenty of corner cases, and the implementation is pretty naive at times (but 
 hey, it works). My only advice is to never lose hope.

I agree there are plenty of hard-to-define cases, and I came accross a
hot debate on missing data representation in python:
https://github.com/njsmith/numpy/wiki/NA-discussion-status

but still I believe np.ma is very usable when compression is not
strongly needed.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread Charles R Harris
On Mon, Jul 15, 2013 at 9:55 AM, Sebastian Berg
sebast...@sipsolutions.netwrote:

 On Mon, 2013-07-15 at 08:47 -0600, Charles R Harris wrote:
 
 
  On Mon, Jul 15, 2013 at 8:34 AM, Sebastian Berg
  sebast...@sipsolutions.net wrote:
  On Mon, 2013-07-15 at 07:52 -0600, Charles R Harris wrote:
  
  
   On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris
   charlesr.har...@gmail.com wrote:
  
 
 
  snip
 
  
   For nansum, I would expect 0 even in the
  case of all
   nans.  The point
   of these functions is to simply ignore nans,
  correct?
So I would aim
   for this behaviour:  nanfunc(x) behaves the
  same as
   func(x[~isnan(x)])
  
  
   Agreed, although that changes current behavior. What
  about the
   other cases?
  
  
  
   Looks like there isn't much interest in the topic, so I'll
  just go
   ahead with the following choices:
  
   Non-NaN case
  
   1) Empty array - ValueError
  
   The current behavior with stats is an accident, i.e., the
  nan arises
   from 0/0. I like to think that in this case the result is
  any number,
   rather than not a number, so *the* value is simply not
  defined. So in
   this case raise a ValueError for empty array.
  
 
  To be honest, I don't mind the current behaviour much sum([])
  = 0,
  len([]) = 0, so it is in a way well defined. At least I am not
  sure if I
  would prefer always an error. I am a bit worried that just
  changing it
  might break code out there, such as plotting code where it
  makes
  perfectly sense to plot a NaN (i.e. nothing), but if that is
  the case it
  would probably be visible fast.
 
  I'm talking about mean, var, and std as statistics, sum isn't part of
  that. If there is agreement that nansum of empty arrays/columns should
  be zero I will do that. Note the sums of empty arrays may or may not
  be empty.
 
  In [1]: ones((0, 3)).sum(axis=0)
  Out[1]: array([ 0.,  0.,  0.])
 
  In [2]: ones((3, 0)).sum(axis=0)
  Out[2]: array([], dtype=float64)
 
  Which, sort of, makes sense.
 
 
 I think we can agree that the behaviour for reductions with an identity
 should default to returning the identity, including for the nanfuncs,
 i.e. sum([]) is 0, product([]) is 1...

 Since mean = sum/length is a sensible definition, having 0/0 as a result
 doesn't seem to bad to me to be honest, it might be accidental but it is
 not a special case in the code ;). Though I don't mind an error as long
 as it doesn't break matplotlib or so.

 I agree about the nanfuncs raising an error would probably be more of a
 problem then for a usual ufunc, but still a bit hesitant about saying
 that it is ok too. I could imagine adding a very general identity
 argument (though I would not call it identity, because it is not the
 same as `np.add.identity`, just used in a place where that would be used
 otherwise):

 np.add.reduce([], identity=123) - [123]
 np.add.reduce([1], identity=123) - [1]
 np.nanmean([np.nan], identity=None) - Error
 np.nanmean([np.nan], identity=np.nan) - np.nan

 It doesn't really make sense, but:
 np.subtract.reduce([]) - Error, since np.substract.identity is None
 np.subtract.reduce([], identity=0) - 0, suppressing the error.

 I am not sure if I am convinced myself, but especially for the nanfuncs
 it could maybe provide a way to circumvent the problem somewhat.
 Including functions such as np.nanargmin, whose result type does not
 even support NaN. Plus it gives an argument allowing for warnings about
 changing behaviour.


Let me try to summarize. To begin with, the environment of the nan
functions is rather special.

1) if the array is of not of inexact type, they punt to the non-nan
versions.
2) if the array is of inexact type, then out and dtype must be inexact if
specified

The second assumption guarantees that NaN can be used in the return values.

*sum and nansum*

These should be consistent so that empty sums are 0. This should cover the
empty array case, but will change the behaviour of nansum which currently
returns NaN if the array isn't empty but the slice is after NaN removal.

*mean and nanmean*

In the case of empty arrays, an empty slice, this leads to 0/0. For Python
this is always a zero division error, for Numpy this raises a warning and
and returns NaN for floats, 0 for integers.

Currently mean returns NaN and raises a RuntimeWarning when 0/0 occurs. In
the special case where dtype=int, the NaN is cast to integer.

Option1
1) mean raise error on 0/0
2) nanmean no warning, return NaN

Option2

Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread Nathaniel Smith
On Mon, Jul 15, 2013 at 6:29 PM, Charles R Harris
charlesr.har...@gmail.com wrote:
 Let me try to summarize. To begin with, the environment of the nan functions
 is rather special.

 1) if the array is of not of inexact type, they punt to the non-nan
 versions.
 2) if the array is of inexact type, then out and dtype must be inexact if
 specified

 The second assumption guarantees that NaN can be used in the return values.

The requirement on the 'out' dtype only exists because currently the
nan function like to return nan for things like empty arrays, right?
If not for that, it could be relaxed? (it's a rather weird
requirement, since the whole point of these functions is that they
ignore nans, yet they don't always...)

 sum and nansum

 These should be consistent so that empty sums are 0. This should cover the
 empty array case, but will change the behaviour of nansum which currently
 returns NaN if the array isn't empty but the slice is after NaN removal.

I agree that returning 0 is the right behaviour, but we might need a
FutureWarning period.

 mean and nanmean

 In the case of empty arrays, an empty slice, this leads to 0/0. For Python
 this is always a zero division error, for Numpy this raises a warning and
 and returns NaN for floats, 0 for integers.

 Currently mean returns NaN and raises a RuntimeWarning when 0/0 occurs. In
 the special case where dtype=int, the NaN is cast to integer.

 Option1
 1) mean raise error on 0/0
 2) nanmean no warning, return NaN

 Option2
 1) mean raise warning, return NaN (current behavior)
 2) nanmean no warning, return NaN

 Option3
 1) mean raise warning, return NaN (current behavior)
 2) nanmean raise warning, return NaN

I have mixed feelings about the whole np.seterr apparatus, but since
it exists, shouldn't we use it for consistency? I.e., just do whatever
numpy is set up to do with 0/0? (Which I think means, warn and return
NaN by default, but this can be changed.)

 var, std, nanvar, nanstd

 1) if ddof  axis(axes) size, raise error, probably a program bug.
 2) If ddof=0, then whatever is the case for mean, nanmean

 For nanvar, nanstd it is possible that some slice are good, some bad, so

 option1
 1) if n - ddof = 0 for a slice, raise warning, return NaN for slice

 option2
 1) if n - ddof = 0 for a slice, don't warn, return NaN for slice

I don't really have any intuition for these ddof cases. Just raising
an error on negative effective dof is pretty defensible and might be
the safest -- it's a easy to turn an error into something sensible
later if people come up with use cases...

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread josef . pktd
On Mon, Jul 15, 2013 at 2:55 PM, Nathaniel Smith n...@pobox.com wrote:
 On Mon, Jul 15, 2013 at 6:29 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 Let me try to summarize. To begin with, the environment of the nan functions
 is rather special.

 1) if the array is of not of inexact type, they punt to the non-nan
 versions.
 2) if the array is of inexact type, then out and dtype must be inexact if
 specified

 The second assumption guarantees that NaN can be used in the return values.

 The requirement on the 'out' dtype only exists because currently the
 nan function like to return nan for things like empty arrays, right?
 If not for that, it could be relaxed? (it's a rather weird
 requirement, since the whole point of these functions is that they
 ignore nans, yet they don't always...)

 sum and nansum

 These should be consistent so that empty sums are 0. This should cover the
 empty array case, but will change the behaviour of nansum which currently
 returns NaN if the array isn't empty but the slice is after NaN removal.

 I agree that returning 0 is the right behaviour, but we might need a
 FutureWarning period.

 mean and nanmean

 In the case of empty arrays, an empty slice, this leads to 0/0. For Python
 this is always a zero division error, for Numpy this raises a warning and
 and returns NaN for floats, 0 for integers.

 Currently mean returns NaN and raises a RuntimeWarning when 0/0 occurs. In
 the special case where dtype=int, the NaN is cast to integer.

 Option1
 1) mean raise error on 0/0
 2) nanmean no warning, return NaN

 Option2
 1) mean raise warning, return NaN (current behavior)
 2) nanmean no warning, return NaN

 Option3
 1) mean raise warning, return NaN (current behavior)
 2) nanmean raise warning, return NaN

 I have mixed feelings about the whole np.seterr apparatus, but since
 it exists, shouldn't we use it for consistency? I.e., just do whatever
 numpy is set up to do with 0/0? (Which I think means, warn and return
 NaN by default, but this can be changed.)

 var, std, nanvar, nanstd

 1) if ddof  axis(axes) size, raise error, probably a program bug.
 2) If ddof=0, then whatever is the case for mean, nanmean

 For nanvar, nanstd it is possible that some slice are good, some bad, so

 option1
 1) if n - ddof = 0 for a slice, raise warning, return NaN for slice

 option2
 1) if n - ddof = 0 for a slice, don't warn, return NaN for slice

 I don't really have any intuition for these ddof cases. Just raising
 an error on negative effective dof is pretty defensible and might be
 the safest -- it's a easy to turn an error into something sensible
 later if people come up with use cases...

related why does reduceat not have empty slices?

 np.add.reduceat(np.arange(8),[0,4, 5, 7,7])
array([ 6,  4, 11,  7,  7])


I'm in favor of returning nans instead of raising exceptions, except
if the return type is int and we cannot cast nan to int.

If we get functions into numpy that know how to handle nans, then it
would be useful to get the nans, so we can work with them

Some cases where this might come in handy are when we iterate over
slices of an array that define groups or category levels with possible
empty groups *)

 idx = np.repeat(np.array([0, 1, 2, 3]), [4, 3, 0, 2])
 x = np.arange(9)
 [x[idx==ii].mean() for ii in range(4)]
[1.5, 5.0, nan, 7.5]

instead of
 [x[idx==ii].mean() for ii in range(4) if (idx==ii).sum()0]
[1.5, 5.0, 7.5]

same for var, I wouldn't have to check that the size is larger than
the ddof (whatever that is in the specific case)

*) groups could be empty because they were defined for a larger
dataset or as a union of different datasets


PS: I used mean() above and not var() because

 np.__version__
'1.5.1'
 np.mean([])
nan
 np.var([])
0.0

Josef


 -n
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread josef . pktd
On Mon, Jul 15, 2013 at 4:24 PM,  josef.p...@gmail.com wrote:
 On Mon, Jul 15, 2013 at 2:55 PM, Nathaniel Smith n...@pobox.com wrote:
 On Mon, Jul 15, 2013 at 6:29 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 Let me try to summarize. To begin with, the environment of the nan functions
 is rather special.

 1) if the array is of not of inexact type, they punt to the non-nan
 versions.
 2) if the array is of inexact type, then out and dtype must be inexact if
 specified

 The second assumption guarantees that NaN can be used in the return values.

 The requirement on the 'out' dtype only exists because currently the
 nan function like to return nan for things like empty arrays, right?
 If not for that, it could be relaxed? (it's a rather weird
 requirement, since the whole point of these functions is that they
 ignore nans, yet they don't always...)

 sum and nansum

 These should be consistent so that empty sums are 0. This should cover the
 empty array case, but will change the behaviour of nansum which currently
 returns NaN if the array isn't empty but the slice is after NaN removal.

 I agree that returning 0 is the right behaviour, but we might need a
 FutureWarning period.

 mean and nanmean

 In the case of empty arrays, an empty slice, this leads to 0/0. For Python
 this is always a zero division error, for Numpy this raises a warning and
 and returns NaN for floats, 0 for integers.

 Currently mean returns NaN and raises a RuntimeWarning when 0/0 occurs. In
 the special case where dtype=int, the NaN is cast to integer.

 Option1
 1) mean raise error on 0/0
 2) nanmean no warning, return NaN

 Option2
 1) mean raise warning, return NaN (current behavior)
 2) nanmean no warning, return NaN

 Option3
 1) mean raise warning, return NaN (current behavior)
 2) nanmean raise warning, return NaN

 I have mixed feelings about the whole np.seterr apparatus, but since
 it exists, shouldn't we use it for consistency? I.e., just do whatever
 numpy is set up to do with 0/0? (Which I think means, warn and return
 NaN by default, but this can be changed.)

 var, std, nanvar, nanstd

 1) if ddof  axis(axes) size, raise error, probably a program bug.
 2) If ddof=0, then whatever is the case for mean, nanmean

 For nanvar, nanstd it is possible that some slice are good, some bad, so

 option1
 1) if n - ddof = 0 for a slice, raise warning, return NaN for slice

 option2
 1) if n - ddof = 0 for a slice, don't warn, return NaN for slice

 I don't really have any intuition for these ddof cases. Just raising
 an error on negative effective dof is pretty defensible and might be
 the safest -- it's a easy to turn an error into something sensible
 later if people come up with use cases...

 related why does reduceat not have empty slices?

 np.add.reduceat(np.arange(8),[0,4, 5, 7,7])
 array([ 6,  4, 11,  7,  7])


 I'm in favor of returning nans instead of raising exceptions, except
 if the return type is int and we cannot cast nan to int.

 If we get functions into numpy that know how to handle nans, then it
 would be useful to get the nans, so we can work with them

 Some cases where this might come in handy are when we iterate over
 slices of an array that define groups or category levels with possible
 empty groups *)

 idx = np.repeat(np.array([0, 1, 2, 3]), [4, 3, 0, 2])
 x = np.arange(9)
 [x[idx==ii].mean() for ii in range(4)]
 [1.5, 5.0, nan, 7.5]

 instead of
 [x[idx==ii].mean() for ii in range(4) if (idx==ii).sum()0]
 [1.5, 5.0, 7.5]

 same for var, I wouldn't have to check that the size is larger than
 the ddof (whatever that is in the specific case)

 *) groups could be empty because they were defined for a larger
 dataset or as a union of different datasets

background:

I wrote several robust anova versions a few weeks ago, that were
essentially list comprehension as above. However, I didn't allow nans
and didn't check for minimum size.
Allowing for empty groups to return nan would mainly be a convenience,
since I need to check the group size only once.

ddof: tests for proportions have ddof=0, for regular t-test ddof=1,
for tests of correlation ddof=2   IIRC
so we would need to check for the corresponding minimum size that n-ddof0

negative effective dof doesn't exist, that's np.maximum(n - ddof, 0)
which is always non-negative but might result in a zero-division
error. :)

I don't think making anything conditional on ddof0 is useful.

Josef



 PS: I used mean() above and not var() because

 np.__version__
 '1.5.1'
 np.mean([])
 nan
 np.var([])
 0.0

 Josef


 -n
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread Charles R Harris
On Mon, Jul 15, 2013 at 2:44 PM, josef.p...@gmail.com wrote:

 On Mon, Jul 15, 2013 at 4:24 PM,  josef.p...@gmail.com wrote:
  On Mon, Jul 15, 2013 at 2:55 PM, Nathaniel Smith n...@pobox.com wrote:
  On Mon, Jul 15, 2013 at 6:29 PM, Charles R Harris
  charlesr.har...@gmail.com wrote:
  Let me try to summarize. To begin with, the environment of the nan
 functions
  is rather special.
 
  1) if the array is of not of inexact type, they punt to the non-nan
  versions.
  2) if the array is of inexact type, then out and dtype must be inexact
 if
  specified
 
  The second assumption guarantees that NaN can be used in the return
 values.
 
  The requirement on the 'out' dtype only exists because currently the
  nan function like to return nan for things like empty arrays, right?
  If not for that, it could be relaxed? (it's a rather weird
  requirement, since the whole point of these functions is that they
  ignore nans, yet they don't always...)
 
  sum and nansum
 
  These should be consistent so that empty sums are 0. This should cover
 the
  empty array case, but will change the behaviour of nansum which
 currently
  returns NaN if the array isn't empty but the slice is after NaN
 removal.
 
  I agree that returning 0 is the right behaviour, but we might need a
  FutureWarning period.
 
  mean and nanmean
 
  In the case of empty arrays, an empty slice, this leads to 0/0. For
 Python
  this is always a zero division error, for Numpy this raises a warning
 and
  and returns NaN for floats, 0 for integers.
 
  Currently mean returns NaN and raises a RuntimeWarning when 0/0
 occurs. In
  the special case where dtype=int, the NaN is cast to integer.
 
  Option1
  1) mean raise error on 0/0
  2) nanmean no warning, return NaN
 
  Option2
  1) mean raise warning, return NaN (current behavior)
  2) nanmean no warning, return NaN
 
  Option3
  1) mean raise warning, return NaN (current behavior)
  2) nanmean raise warning, return NaN
 
  I have mixed feelings about the whole np.seterr apparatus, but since
  it exists, shouldn't we use it for consistency? I.e., just do whatever
  numpy is set up to do with 0/0? (Which I think means, warn and return
  NaN by default, but this can be changed.)
 
  var, std, nanvar, nanstd
 
  1) if ddof  axis(axes) size, raise error, probably a program bug.
  2) If ddof=0, then whatever is the case for mean, nanmean
 
  For nanvar, nanstd it is possible that some slice are good, some bad,
 so
 
  option1
  1) if n - ddof = 0 for a slice, raise warning, return NaN for slice
 
  option2
  1) if n - ddof = 0 for a slice, don't warn, return NaN for slice
 
  I don't really have any intuition for these ddof cases. Just raising
  an error on negative effective dof is pretty defensible and might be
  the safest -- it's a easy to turn an error into something sensible
  later if people come up with use cases...
 
  related why does reduceat not have empty slices?
 
  np.add.reduceat(np.arange(8),[0,4, 5, 7,7])
  array([ 6,  4, 11,  7,  7])
 
 
  I'm in favor of returning nans instead of raising exceptions, except
  if the return type is int and we cannot cast nan to int.
 
  If we get functions into numpy that know how to handle nans, then it
  would be useful to get the nans, so we can work with them
 
  Some cases where this might come in handy are when we iterate over
  slices of an array that define groups or category levels with possible
  empty groups *)
 
  idx = np.repeat(np.array([0, 1, 2, 3]), [4, 3, 0, 2])
  x = np.arange(9)
  [x[idx==ii].mean() for ii in range(4)]
  [1.5, 5.0, nan, 7.5]
 
  instead of
  [x[idx==ii].mean() for ii in range(4) if (idx==ii).sum()0]
  [1.5, 5.0, 7.5]
 
  same for var, I wouldn't have to check that the size is larger than
  the ddof (whatever that is in the specific case)
 
  *) groups could be empty because they were defined for a larger
  dataset or as a union of different datasets

 background:

 I wrote several robust anova versions a few weeks ago, that were
 essentially list comprehension as above. However, I didn't allow nans
 and didn't check for minimum size.
 Allowing for empty groups to return nan would mainly be a convenience,
 since I need to check the group size only once.

 ddof: tests for proportions have ddof=0, for regular t-test ddof=1,
 for tests of correlation ddof=2   IIRC
 so we would need to check for the corresponding minimum size that n-ddof0

 negative effective dof doesn't exist, that's np.maximum(n - ddof, 0)
 which is always non-negative but might result in a zero-division
 error. :)

 I don't think making anything conditional on ddof0 is useful.


So how would you want it?

To summarize the problem areas:

1) What is the sum of an empty slice? NaN or 0?
2) What is mean of empy slice? NaN, NaN and warn, or error?
3) What if n - ddof  0 for slice? NaN, NaN and warn, or error?
4) What if n - ddof = 0 for slice? NaN, NaN and warn, or error?

I'm tending to NaN and warn for 2 -- 3, because, as Nathaniel notes, the
warning can be 

Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread josef . pktd
On Mon, Jul 15, 2013 at 5:34 PM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Mon, Jul 15, 2013 at 2:44 PM, josef.p...@gmail.com wrote:

 On Mon, Jul 15, 2013 at 4:24 PM,  josef.p...@gmail.com wrote:
  On Mon, Jul 15, 2013 at 2:55 PM, Nathaniel Smith n...@pobox.com wrote:
  On Mon, Jul 15, 2013 at 6:29 PM, Charles R Harris
  charlesr.har...@gmail.com wrote:
  Let me try to summarize. To begin with, the environment of the nan
  functions
  is rather special.
 
  1) if the array is of not of inexact type, they punt to the non-nan
  versions.
  2) if the array is of inexact type, then out and dtype must be inexact
  if
  specified
 
  The second assumption guarantees that NaN can be used in the return
  values.
 
  The requirement on the 'out' dtype only exists because currently the
  nan function like to return nan for things like empty arrays, right?
  If not for that, it could be relaxed? (it's a rather weird
  requirement, since the whole point of these functions is that they
  ignore nans, yet they don't always...)
 
  sum and nansum
 
  These should be consistent so that empty sums are 0. This should cover
  the
  empty array case, but will change the behaviour of nansum which
  currently
  returns NaN if the array isn't empty but the slice is after NaN
  removal.
 
  I agree that returning 0 is the right behaviour, but we might need a
  FutureWarning period.
 
  mean and nanmean
 
  In the case of empty arrays, an empty slice, this leads to 0/0. For
  Python
  this is always a zero division error, for Numpy this raises a warning
  and
  and returns NaN for floats, 0 for integers.
 
  Currently mean returns NaN and raises a RuntimeWarning when 0/0
  occurs. In
  the special case where dtype=int, the NaN is cast to integer.
 
  Option1
  1) mean raise error on 0/0
  2) nanmean no warning, return NaN
 
  Option2
  1) mean raise warning, return NaN (current behavior)
  2) nanmean no warning, return NaN
 
  Option3
  1) mean raise warning, return NaN (current behavior)
  2) nanmean raise warning, return NaN
 
  I have mixed feelings about the whole np.seterr apparatus, but since
  it exists, shouldn't we use it for consistency? I.e., just do whatever
  numpy is set up to do with 0/0? (Which I think means, warn and return
  NaN by default, but this can be changed.)
 
  var, std, nanvar, nanstd
 
  1) if ddof  axis(axes) size, raise error, probably a program bug.
  2) If ddof=0, then whatever is the case for mean, nanmean
 
  For nanvar, nanstd it is possible that some slice are good, some bad,
  so
 
  option1
  1) if n - ddof = 0 for a slice, raise warning, return NaN for slice
 
  option2
  1) if n - ddof = 0 for a slice, don't warn, return NaN for slice
 
  I don't really have any intuition for these ddof cases. Just raising
  an error on negative effective dof is pretty defensible and might be
  the safest -- it's a easy to turn an error into something sensible
  later if people come up with use cases...
 
  related why does reduceat not have empty slices?
 
  np.add.reduceat(np.arange(8),[0,4, 5, 7,7])
  array([ 6,  4, 11,  7,  7])
 
 
  I'm in favor of returning nans instead of raising exceptions, except
  if the return type is int and we cannot cast nan to int.
 
  If we get functions into numpy that know how to handle nans, then it
  would be useful to get the nans, so we can work with them
 
  Some cases where this might come in handy are when we iterate over
  slices of an array that define groups or category levels with possible
  empty groups *)
 
  idx = np.repeat(np.array([0, 1, 2, 3]), [4, 3, 0, 2])
  x = np.arange(9)
  [x[idx==ii].mean() for ii in range(4)]
  [1.5, 5.0, nan, 7.5]
 
  instead of
  [x[idx==ii].mean() for ii in range(4) if (idx==ii).sum()0]
  [1.5, 5.0, 7.5]
 
  same for var, I wouldn't have to check that the size is larger than
  the ddof (whatever that is in the specific case)
 
  *) groups could be empty because they were defined for a larger
  dataset or as a union of different datasets

 background:

 I wrote several robust anova versions a few weeks ago, that were
 essentially list comprehension as above. However, I didn't allow nans
 and didn't check for minimum size.
 Allowing for empty groups to return nan would mainly be a convenience,
 since I need to check the group size only once.

 ddof: tests for proportions have ddof=0, for regular t-test ddof=1,
 for tests of correlation ddof=2   IIRC
 so we would need to check for the corresponding minimum size that n-ddof0

 negative effective dof doesn't exist, that's np.maximum(n - ddof, 0)
 which is always non-negative but might result in a zero-division
 error. :)

 I don't think making anything conditional on ddof0 is useful.


 So how would you want it?

 To summarize the problem areas:

 1) What is the sum of an empty slice? NaN or 0?
0 as it is now for sum, (including 0 for nansum with no valid entries).

 2) What is mean of empy slice? NaN, NaN and warn, or error?
 3) What if n - ddof  0 for slice? NaN, 

Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread Charles R Harris
On Mon, Jul 15, 2013 at 3:57 PM, josef.p...@gmail.com wrote:

 On Mon, Jul 15, 2013 at 5:34 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
 
  On Mon, Jul 15, 2013 at 2:44 PM, josef.p...@gmail.com wrote:
 
  On Mon, Jul 15, 2013 at 4:24 PM,  josef.p...@gmail.com wrote:
   On Mon, Jul 15, 2013 at 2:55 PM, Nathaniel Smith n...@pobox.com
 wrote:
   On Mon, Jul 15, 2013 at 6:29 PM, Charles R Harris
   charlesr.har...@gmail.com wrote:
   Let me try to summarize. To begin with, the environment of the nan
   functions
   is rather special.
  
   1) if the array is of not of inexact type, they punt to the non-nan
   versions.
   2) if the array is of inexact type, then out and dtype must be
 inexact
   if
   specified
  
   The second assumption guarantees that NaN can be used in the return
   values.
  
   The requirement on the 'out' dtype only exists because currently the
   nan function like to return nan for things like empty arrays, right?
   If not for that, it could be relaxed? (it's a rather weird
   requirement, since the whole point of these functions is that they
   ignore nans, yet they don't always...)
  
   sum and nansum
  
   These should be consistent so that empty sums are 0. This should
 cover
   the
   empty array case, but will change the behaviour of nansum which
   currently
   returns NaN if the array isn't empty but the slice is after NaN
   removal.
  
   I agree that returning 0 is the right behaviour, but we might need a
   FutureWarning period.
  
   mean and nanmean
  
   In the case of empty arrays, an empty slice, this leads to 0/0. For
   Python
   this is always a zero division error, for Numpy this raises a
 warning
   and
   and returns NaN for floats, 0 for integers.
  
   Currently mean returns NaN and raises a RuntimeWarning when 0/0
   occurs. In
   the special case where dtype=int, the NaN is cast to integer.
  
   Option1
   1) mean raise error on 0/0
   2) nanmean no warning, return NaN
  
   Option2
   1) mean raise warning, return NaN (current behavior)
   2) nanmean no warning, return NaN
  
   Option3
   1) mean raise warning, return NaN (current behavior)
   2) nanmean raise warning, return NaN
  
   I have mixed feelings about the whole np.seterr apparatus, but since
   it exists, shouldn't we use it for consistency? I.e., just do
 whatever
   numpy is set up to do with 0/0? (Which I think means, warn and return
   NaN by default, but this can be changed.)
  
   var, std, nanvar, nanstd
  
   1) if ddof  axis(axes) size, raise error, probably a program bug.
   2) If ddof=0, then whatever is the case for mean, nanmean
  
   For nanvar, nanstd it is possible that some slice are good, some
 bad,
   so
  
   option1
   1) if n - ddof = 0 for a slice, raise warning, return NaN for slice
  
   option2
   1) if n - ddof = 0 for a slice, don't warn, return NaN for slice
  
   I don't really have any intuition for these ddof cases. Just raising
   an error on negative effective dof is pretty defensible and might be
   the safest -- it's a easy to turn an error into something sensible
   later if people come up with use cases...
  
   related why does reduceat not have empty slices?
  
   np.add.reduceat(np.arange(8),[0,4, 5, 7,7])
   array([ 6,  4, 11,  7,  7])
  
  
   I'm in favor of returning nans instead of raising exceptions, except
   if the return type is int and we cannot cast nan to int.
  
   If we get functions into numpy that know how to handle nans, then it
   would be useful to get the nans, so we can work with them
  
   Some cases where this might come in handy are when we iterate over
   slices of an array that define groups or category levels with possible
   empty groups *)
  
   idx = np.repeat(np.array([0, 1, 2, 3]), [4, 3, 0, 2])
   x = np.arange(9)
   [x[idx==ii].mean() for ii in range(4)]
   [1.5, 5.0, nan, 7.5]
  
   instead of
   [x[idx==ii].mean() for ii in range(4) if (idx==ii).sum()0]
   [1.5, 5.0, 7.5]
  
   same for var, I wouldn't have to check that the size is larger than
   the ddof (whatever that is in the specific case)
  
   *) groups could be empty because they were defined for a larger
   dataset or as a union of different datasets
 
  background:
 
  I wrote several robust anova versions a few weeks ago, that were
  essentially list comprehension as above. However, I didn't allow nans
  and didn't check for minimum size.
  Allowing for empty groups to return nan would mainly be a convenience,
  since I need to check the group size only once.
 
  ddof: tests for proportions have ddof=0, for regular t-test ddof=1,
  for tests of correlation ddof=2   IIRC
  so we would need to check for the corresponding minimum size that
 n-ddof0
 
  negative effective dof doesn't exist, that's np.maximum(n - ddof, 0)
  which is always non-negative but might result in a zero-division
  error. :)
 
  I don't think making anything conditional on ddof0 is useful.
 
 
  So how would you want it?
 
  To summarize the problem areas:
 
  1) What is 

Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread Stéfan van der Walt
On Mon, 15 Jul 2013 08:33:47 -0600, Charles R Harris wrote:
 On Mon, Jul 15, 2013 at 8:25 AM, Benjamin Root ben.r...@ou.edu wrote:
 
  This is going to need to be heavily documented with doctests. Also, just
  to clarify, are we talking about a ValueError for doing a nansum on an
  empty array as well, or will that now return a zero?
 
 
 I was going to leave nansum as is, as it seems that the result was by
 choice rather than by accident.

That makes sense--I like Sebastian's explanation whereby operations that
define an identity yields that upon empty input.

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread Charles R Harris
On Mon, Jul 15, 2013 at 6:22 PM, Stéfan van der Walt ste...@sun.ac.zawrote:

 On Mon, 15 Jul 2013 08:33:47 -0600, Charles R Harris wrote:
  On Mon, Jul 15, 2013 at 8:25 AM, Benjamin Root ben.r...@ou.edu wrote:
 
   This is going to need to be heavily documented with doctests. Also,
 just
   to clarify, are we talking about a ValueError for doing a nansum on an
   empty array as well, or will that now return a zero?
  
  
  I was going to leave nansum as is, as it seems that the result was by
  choice rather than by accident.

 That makes sense--I like Sebastian's explanation whereby operations that
 define an identity yields that upon empty input.


So nansum should return zeros rather than the current NaNs?

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread Stéfan van der Walt
On Mon, 15 Jul 2013 18:46:33 -0600, Charles R Harris wrote:
 
 So nansum should return zeros rather than the current NaNs?

Yes, my feeling is that nansum([]) should be 0.

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread Benjamin Root
To add a bit of context to the question of nansum on empty results, we
currently differ from MATLAB and R in this respect, they return zero no
matter what. Personally, I think it should return zero, but our current
behavior of returning nans has existed for a long time.

Personally, I think we need a deprecation warning and possibly wait to
change this until 2.0, with plenty of warning that this will change.

Ben Root
On Jul 15, 2013 8:46 PM, Charles R Harris charlesr.har...@gmail.com
wrote:



 On Mon, Jul 15, 2013 at 6:22 PM, Stéfan van der Walt ste...@sun.ac.zawrote:

 On Mon, 15 Jul 2013 08:33:47 -0600, Charles R Harris wrote:
  On Mon, Jul 15, 2013 at 8:25 AM, Benjamin Root ben.r...@ou.edu wrote:
 
   This is going to need to be heavily documented with doctests. Also,
 just
   to clarify, are we talking about a ValueError for doing a nansum on an
   empty array as well, or will that now return a zero?
  
  
  I was going to leave nansum as is, as it seems that the result was by
  choice rather than by accident.

 That makes sense--I like Sebastian's explanation whereby operations that
 define an identity yields that upon empty input.


 So nansum should return zeros rather than the current NaNs?

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] retrieving original array locations from 2d argsort

2013-07-15 Thread Moroney, Catherine M (398D)
I know that there's an easy way to solve this problem, but I'm not sufficiently 
knowledgeable
about numpy indexing to figure it out.

Here is the problem:

Take a 2-d array a, of any size.
Sort it in ascending order using, I presume, argsort.
Step through the sorted array in order, and for each element in the sorted 
array,
retrieve what the corresponding (line, sample) indices in the original array 
are.

For instance:

a = numpy.arange(0, 16).reshape(4,4)
a[0,:] = -1*numpy.arange(0,4)
a[2,:] = -1*numpy.arange(4,8)

asort = numpy.sort(a, axis=None)
for idx in xrange(0, asort.size):
element = asort[idx]
!! Find the line and sample location in a that corresponds to the i-th 
element in assort

Thank-you for your help,

Catherine



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] retrieving original array locations from 2d argsort

2013-07-15 Thread Warren Weckesser
On 7/15/13, Moroney, Catherine M (398D)
catherine.m.moro...@jpl.nasa.gov wrote:
 I know that there's an easy way to solve this problem, but I'm not
 sufficiently knowledgeable
 about numpy indexing to figure it out.

 Here is the problem:

 Take a 2-d array a, of any size.
 Sort it in ascending order using, I presume, argsort.
 Step through the sorted array in order, and for each element in the sorted
 array,
 retrieve what the corresponding (line, sample) indices in the original array
 are.

 For instance:

 a = numpy.arange(0, 16).reshape(4,4)
 a[0,:] = -1*numpy.arange(0,4)
 a[2,:] = -1*numpy.arange(4,8)

 asort = numpy.sort(a, axis=None)
 for idx in xrange(0, asort.size):
   element = asort[idx]
 !! Find the line and sample location in a that corresponds to the
 i-th element in assort



One way is to use argsort and  `numpy.unravel_index` to recover the
original 2D indices:

code
import numpy

a = numpy.arange(0, 16).reshape(4,4)
a[0,:] = -1*numpy.arange(0,4)
a[2,:] = -1*numpy.arange(4,8)

flat_sort_indices = numpy.argsort(a, axis=None)
original_indices = numpy.unravel_index(flat_sort_indices, a.shape)

print   i   j  a[i,j]
for i, j in zip(*original_indices):
element = a[i,j]
print %3d %3d %6d % (i, j, element)

/code


Warren



 Thank-you for your help,

 Catherine



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread Charles R Harris
On Mon, Jul 15, 2013 at 6:58 PM, Benjamin Root ben.r...@ou.edu wrote:

 To add a bit of context to the question of nansum on empty results, we
 currently differ from MATLAB and R in this respect, they return zero no
 matter what. Personally, I think it should return zero, but our current
 behavior of returning nans has existed for a long time.

 Personally, I think we need a deprecation warning and possibly wait to
 change this until 2.0, with plenty of warning that this will change.

Waiting for the mythical 2.0 probably won't work ;) We also need to give
folks a way to adjust ahead of time. I think the easiest way to do that is
with an extra keyword, say nanok, with True as the starting default, then
later we can make False the default.

snip

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What should be the result in some statistics corner cases?

2013-07-15 Thread Ralf Gommers
On Tue, Jul 16, 2013 at 3:50 AM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Mon, Jul 15, 2013 at 6:58 PM, Benjamin Root ben.r...@ou.edu wrote:

 To add a bit of context to the question of nansum on empty results, we
 currently differ from MATLAB and R in this respect, they return zero no
 matter what. Personally, I think it should return zero, but our current
 behavior of returning nans has existed for a long time.

 Personally, I think we need a deprecation warning and possibly wait to
 change this until 2.0, with plenty of warning that this will change.

 Waiting for the mythical 2.0 probably won't work ;) We also need to give
 folks a way to adjust ahead of time. I think the easiest way to do that is
 with an extra keyword, say nanok, with True as the starting default, then
 later we can make False the default.


No special keywords to work around behavior change please, it doesn't work
well and you end up with a keyword you don't really want.

Why not just give a FutureWarning in 1.8 and change to returning zero in
1.9?

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion