Re: [Numpy-discussion] Indexing a masked array with another masked array leads to unexpected results

2011-11-04 Thread Joe Kington
On Fri, Nov 4, 2011 at 5:26 AM, Pierre GM pgmdevl...@gmail.com wrote:


 On Nov 03, 2011, at 23:07 , Joe Kington wrote:

  I'm not sure if this is exactly a bug, per se, but it's a very confusing
 consequence of the current design of masked arrays…
 I would just add a I think between the but and it's before I could
 agree.

  Consider the following example:
 
  import numpy as np
 
  x = np.ma.masked_all(10, dtype=np.float32)
  print x
  x[x  0] = 5
  print x
 
  The exact results will vary depending the contents of the empty memory
 the array was initialized from.

 Not a surprise. But isn't mentioned in the doc somewhere that using a
 masked array as index is a very bad idea ? And that you should always fill
 it before you use it as an array ? (Actually, using a MaskedArray as index
 used to raise an IndexError. But I thought it was a bit too harsh, so I
 dropped it).


Not that I can find in the docs (Perhaps I just missed it?). At any rate,
it's not mentioned in the numpy.ma section on indexing:
http://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html#indexing-and-slicing

The only mention of it is a comment in MaskedArray.__setitem__ where the
IndexError is commented out.


 ma.masked_all is an empty array with all its elements masked. Ie, you have
 an uninitialized ndarray as data, and a bool array of the same size, full
 of True. The operative word is here uninitialized.

  This wreaks havoc when filtering the contents of masked arrays (and
 leads to hard-to-find bugs!).  The mask of the array in question is altered
 at random (or, rather, based on the masked values as well as the masked
 ones).

 Once again, you're working on an *uninitialized* array. What you should
 really do is to initialize it first, e.g. by 0, or whatever would make
 sense in your field, and then work from that.


Sure, I shouldn't have used that as the example.

My point was that it's counter-intuitive that something like x[x  0] = 0
alters the mask of x based on the values of _masked_ elements.  How it's
initialized is irrelevant (though, of course, it wouldn't be semi-random if
it were initialized in another way).


  I can see the reasoning behind the way it works. It makes sense that x
  0 returns a masked boolean array with potentially several elements
 masked, as well as the unmasked elements greater than 0.

 Well, x  0 is also a masked array, with its mask full of True. Not very
 usable by itself, and especially *not* for indexing.


  However, wouldn't it make more sense to have MaskedArray.__setitem__
 only operate on the unmasked elements of the indx passed in (at least in
 the case where the assigned value isn't a masked array)?


 Normally, that should be the case. But you're not working in normal
 conditions, here. A bit like trying to boil water on a stove with a plastic
 pan.


x[x  threshold] = something is a very common idiom for ndarrays.

I think most people would find it surprising that this operation doesn't
ignore the masked values.

I noticed this because one of my coworkers was complaining that a piece of
my code was messing up their masked arrays.  I'd never tested it with
masked arrays, but it took me ages to find, just because I wasn't looking
in places where I was just using common idioms.  In this particular case,
they'd initialized it with masked_all, so it effectively altered the mask
of the array at random.  Regardless of how it was initialized, though, it
is surprising that the mask of x is changed based on masked values.

I just think it would be useful for it to be documented.

Cheers,

-Joe
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Indexing a masked array with another masked array leads to unexpected results

2011-11-03 Thread Joe Kington
Forgive me if this is already a well-know oddity of masked arrays. I hadn't
seen it before, though.

I'm not sure if this is exactly a bug, per se, but it's a very confusing
consequence of the current design of masked arrays...

Consider the following example:

import numpy as np

x = np.ma.masked_all(10, dtype=np.float32)
print x
x[x  0] = 5
print x

The exact results will vary depending the contents of the empty memory the
array was initialized from.

This wreaks havoc when filtering the contents of masked arrays (and leads
to hard-to-find bugs!).  The mask of the array in question is altered at
random (or, rather, based on the masked values as well as the masked ones).

Of course, once you're aware of this, there are a number of workarounds
(namely, filling the array or explicitly operating on x.data instead of
x).

I can see the reasoning behind the way it works. It makes sense that x 
0 returns a masked boolean array with potentially several elements masked,
as well as the unmasked elements greater than 0.

However, wouldn't it make more sense to have MaskedArray.__setitem__ only
operate on the unmasked elements of the indx passed in (at least in the
case where the assigned value isn't a masked array)?

Cheers,
-Joe
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion