Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-02 Thread josef . pktd
On Sat, Jul 2, 2011 at 4:10 PM, Benjamin Root  wrote:
> On Fri, Jul 1, 2011 at 11:40 PM, Nathaniel Smith  wrote:
>>
>> I'm not sure what you mean here. If we have masked array support at
>> all (and some people seem to want it), then we have to say more than
>> "it's an array with a mask". Indexing such a beast has to do
>> *something*, so we need some kind of theory to say what, ufuncs have
>> to do *something*, ditto. I mean, I guess we could just say that a
>> masked array is literally an np.ndarray where you have attached a
>> field named "mask" that doesn't do anything, but I don't think that
>> would really satisfy most users :-).
>>
>
> Indexing a masked array just returns an array with np.NA in the appropriate
> elements.  This is no different than with regular ndarray objects or in
> numpy.ma.  As for ufuncs, the NEP already addresses this in multiple ways.
> For element-wise ufuncs, a "where" parameter is available for indicating
> which elements to skip.  For reduction ufuncs, a "skipna" parameter will
> indicate whether or not to skip the values.  On top of that, subclassed
> ndarrays (such as numpy.ma, I guess) can create a __ufunc_wrap__ function
> that can set a default value for those parameters to make things easier for
> masked array users.
>
>> I don't know about others, but my main objection is this: He's
>> proposing two different implementations for NA. I only need one, so
>> having two is redundant and confusing. Of these two, the bit-pattern
>> one has lower memory overhead (which many people have spoken up to say
>> matters to them), and really obvious semantics (assignment is
>> implemented as assignment, etc.). So why force people to make this
>> confusing choice? What does the mask implementation add? AFAICT, its
>> only purpose is to satisfy a rather different set of use cases. (See
>> Gary Strangman's email here for a good description of these use cases:
>> http://www.mail-archive.com/numpy-discussion@scipy.org/msg32385.html)
>> But AFAICT again, it's been crippled for those use cases in order to
>> give it the NA semantics. So I just don't see who the masking part is
>> supposed to help.
>>
>
> As a user of numpy.ma, masked arrays have always been a second-class citizen
> to me. Developing new code with it always brought about new surprises and
> discoveries of strange behavior from various functions. In this sense,
> numpy.ma has always been crippled.  By sacrificing *some* of the existing
> semantics (which would likely be taken care of by a re-implemented numpy.ma
> to preserve backwards-compatibility), the masked array community gains a
> first-class citizen in numpy, and numpy developers will have the
> masked/missing data issue in the forefront whenever developing new functions
> and libraries.  I am more than happy with that trade-off.  I am willing to
> learn to semantics so long as I have a guarantee that the functions I use
> behaves the way I expect them to.
>
>>
>> BTW, you can't access the memory of a masked value by taking a view,
>> at least if I'm reading this version of the NEP correctly, and it
>> seems to be the latest:
>>
>>  https://github.com/m-paradox/numpy/blob/4afdb2768c4bb8cfe47c21154c4c8ca5f85e41aa/doc/neps/c-masked-array.rst
>> The only way to access the memory of a masked value is take a view
>> *before* you mask it. And if the array has a mask at all when you take
>> the view, you also have to set a.flags.ownmask = True, before you mask
>> the value.
>
> This isn't actually as bad as it sounds.  From a function's perspective, it
> should only know the values that it has been given access to.  If I -- as a
> user of said function -- decide that certain values should be unknown to the
> function, I wouldn't want the function to be able to override that
> decision.  Remember, it is possible that the masked element never was
> initialized.  Therefore, we wouldn't want the function to use that element.
> (Note, this is one of those "fun" surprises that a numpy.ma user sometimes
> encounters when a function uses np.asarray instead of np.asanyarray).

But as far as I understand this takes away the ability to temporarily
fill in the masked values with values that are neutral for a
calculation, e.g. zero when taking a sum or dot product.
Instead it looks like a copy of the array has to be made in the new version.
(I'm thinking more correlate, convolution, linalg, scipy.signal, not
simple ufuncs. In many cases new arrays might be created anyway so the
loss from getting a copy of the non-NA data might not be so severe.)

I guess the "fun" surprises will remain fun since most function in
scipy or other libraries won't suddenly learn how to handle masked
arrays or NAs. What happens if you feed the new animals to linalg.svd,
or linalg.inv or fft ... that are all designed for asarray and not for
asanyarray?

Josef

>
> Ben Root
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.

Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-02 Thread Benjamin Root
On Fri, Jul 1, 2011 at 11:40 PM, Nathaniel Smith  wrote:

>
> I'm not sure what you mean here. If we have masked array support at
> all (and some people seem to want it), then we have to say more than
> "it's an array with a mask". Indexing such a beast has to do
> *something*, so we need some kind of theory to say what, ufuncs have
> to do *something*, ditto. I mean, I guess we could just say that a
> masked array is literally an np.ndarray where you have attached a
> field named "mask" that doesn't do anything, but I don't think that
> would really satisfy most users :-).
>
>
Indexing a masked array just returns an array with np.NA in the appropriate
elements.  This is no different than with regular ndarray objects or in
numpy.ma.  As for ufuncs, the NEP already addresses this in multiple ways.
For element-wise ufuncs, a "where" parameter is available for indicating
which elements to skip.  For reduction ufuncs, a "skipna" parameter will
indicate whether or not to skip the values.  On top of that, subclassed
ndarrays (such as numpy.ma, I guess) can create a __ufunc_wrap__ function
that can set a default value for those parameters to make things easier for
masked array users.

I don't know about others, but my main objection is this: He's
> proposing two different implementations for NA. I only need one, so
> having two is redundant and confusing. Of these two, the bit-pattern
> one has lower memory overhead (which many people have spoken up to say
> matters to them), and really obvious semantics (assignment is
> implemented as assignment, etc.). So why force people to make this
> confusing choice? What does the mask implementation add? AFAICT, its
> only purpose is to satisfy a rather different set of use cases. (See
> Gary Strangman's email here for a good description of these use cases:
> http://www.mail-archive.com/numpy-discussion@scipy.org/msg32385.html)
> But AFAICT again, it's been crippled for those use cases in order to
> give it the NA semantics. So I just don't see who the masking part is
> supposed to help.
>
>
As a user of numpy.ma, masked arrays have always been a second-class citizen
to me. Developing new code with it always brought about new surprises and
discoveries of strange behavior from various functions. In this sense,
numpy.ma has always been crippled.  By sacrificing *some* of the existing
semantics (which would likely be taken care of by a re-implemented
numpy.mato preserve backwards-compatibility), the masked array
community gains a
first-class citizen in numpy, and numpy developers will have the
masked/missing data issue in the forefront whenever developing new functions
and libraries.  I am more than happy with that trade-off.  I am willing to
learn to semantics so long as I have a guarantee that the functions I use
behaves the way I expect them to.


> BTW, you can't access the memory of a masked value by taking a view,
> at least if I'm reading this version of the NEP correctly, and it
> seems to be the latest:
>
> https://github.com/m-paradox/numpy/blob/4afdb2768c4bb8cfe47c21154c4c8ca5f85e41aa/doc/neps/c-masked-array.rst
> The only way to access the memory of a masked value is take a view
> *before* you mask it. And if the array has a mask at all when you take
> the view, you also have to set a.flags.ownmask = True, before you mask
> the value.
>

This isn't actually as bad as it sounds.  From a function's perspective, it
should only know the values that it has been given access to.  If I -- as a
user of said function -- decide that certain values should be unknown to the
function, I wouldn't want the function to be able to override that
decision.  Remember, it is possible that the masked element never was
initialized.  Therefore, we wouldn't want the function to use that element.
(Note, this is one of those "fun" surprises that a numpy.ma user sometimes
encounters when a function uses np.asarray instead of np.asanyarray).

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-02 Thread Nathaniel Smith
On Fri, Jul 1, 2011 at 10:07 PM, Eric Firing  wrote:
> On 07/01/2011 06:40 PM, Nathaniel Smith wrote:
>> On Fri, Jul 1, 2011 at 9:29 AM, Christopher Jordan-Squire
>
>> BTW, you can't access the memory of a masked value by taking a view,
>> at least if I'm reading this version of the NEP correctly, and it
>> seems to be the latest:
>>    
>> https://github.com/m-paradox/numpy/blob/4afdb2768c4bb8cfe47c21154c4c8ca5f85e41aa/doc/neps/c-masked-array.rst
>
> No, to see the latest you need to go to pull request #99, I believe:
> https://github.com/numpy/numpy/pull/99
>  From there click the diff button, then select
> doc/neps/missing-data.rst, then "view file" to get to a formatted view
> of the whole file in its most recent form. You can also look at the
> history of the file there.  c-masked-array.rst was renamed to
> missing-data.rst and editing continued.

Oh. Thanks for the link!

Fortunately, I'm not seeing any changes that invalidate anything I've
said here. The disappearance of .validitymask changes the details of
my response earlier to Pierre, but not the content, I think. But sorry
for the confusion.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Eric Firing
On 07/01/2011 06:40 PM, Nathaniel Smith wrote:
> On Fri, Jul 1, 2011 at 9:29 AM, Christopher Jordan-Squire

> BTW, you can't access the memory of a masked value by taking a view,
> at least if I'm reading this version of the NEP correctly, and it
> seems to be the latest:
>
> https://github.com/m-paradox/numpy/blob/4afdb2768c4bb8cfe47c21154c4c8ca5f85e41aa/doc/neps/c-masked-array.rst

No, to see the latest you need to go to pull request #99, I believe:
https://github.com/numpy/numpy/pull/99
 From there click the diff button, then select 
doc/neps/missing-data.rst, then "view file" to get to a formatted view 
of the whole file in its most recent form. You can also look at the 
history of the file there.  c-masked-array.rst was renamed to 
missing-data.rst and editing continued.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Nathaniel Smith
On Fri, Jul 1, 2011 at 9:29 AM, Christopher Jordan-Squire
 wrote:
> This is kind of late to be jumping into the 'long thread of doom', but I've
> been following most of the posts, so I'd figured I'd throw in my 2 cents.
> I'm Mark's officemate over the summer, and we've been talking daily about
> his design. I was skeptical of various details at first, but by now Mark's
> largely sold me on his design. Though, FWIW, my background is largely
> statistical uses of arrays rather than scientific uses, so I grok missing
> data usage more naturally than masking.

Always good to hear more perspectives! Thanks for speaking up.

> I looked over the theoretical mode in the aNEP, and I disagree with it. I
> think a masked array is just that: an array with a mask. Do whatever with
> the mask, but it's up to the user to decide how they want to use it. It
> doesn't seem like it has to come with a theoretical model. (Unlike missing
> data, which comes which does have a nice theoretical model.)

I'm not sure what you mean here. If we have masked array support at
all (and some people seem to want it), then we have to say more than
"it's an array with a mask". Indexing such a beast has to do
*something*, so we need some kind of theory to say what, ufuncs have
to do *something*, ditto. I mean, I guess we could just say that a
masked array is literally an np.ndarray where you have attached a
field named "mask" that doesn't do anything, but I don't think that
would really satisfy most users :-).

> The theoretical model in the aNEP seems to assume too much. I'm thinking in
> particular of this idea: "a length-4 array in which the last value has been
> masked out behaves just like an ordinary length-3 array, so long as you
> don't change the mask." That's forcing a notion of column/position
> independence on the masked array, in that any function operating on the rows
> must treat each column the same. And I'm don't think that's part of the
> contract that should come from creating a masked array.

I'm really lost on what you mean by columns versus rows here. In that
sentence I'm literally saying that these two 1-d arrays should behave
the same:
  [1, 2, 3]
  [1, 2, 3, --]
For example, we have to decide what np.sum should do on the second
array. Well, this says that it should work like this:
  >>> np.sum(np.array([1, 2, 3, np.IGNORE]))
  6
Why? Because that's what happens when we do this:
  >>> np.sum(np.array([1, 2, 3]))
  6
There are other ways to think about how masked arrays should act, but
this seemed like one plausible heuristic to put out there as a
starting point.

...If you still have an objection, could you rephrase it?

And any thoughts on how I could phrase that better?

> I'm a statistics grad students and an R user, and I'm mostly ok with what
> Mark is doing.
> Currently, as I understand it, Mark is working on a structure that will make
> missing data into a first class citizen in the numpy world. This is great!
> Before it had been more of a 2nd class-citizen. And Mark is even trying to
> copy R semantics as much as possible.

Yes, It's wonderful!

> It's true that Mark's making it so the masked part of these new arrays won't
> be as front and center. The functionality will be there and it will be easy
> to used. But it will be based more on an explicit contract that the data
> memory contents of a masked array will not be overwritten when the data is
> masked. So I don't think Mark is making anything implicit--he's making a
> very explicit contract about how the data memory is handled when the mask is
> changed.
> If I understand correctly, it seems like the main objection to Mark's
> current API is that the explicit contract about data memory isn't somehow
> immediately visible in the API. It's true this is a trade-off, but it leads
> to a simpler API with easier ability to use all features at once at the
> pretty small cost of the user just having to read enough to realize that
> there's an explicit contract about what happens to the memory of a masked
> value, and they can access it by taking a view. That's easy enough to add at
> the very beginning of the documentation.

I don't know about others, but my main objection is this: He's
proposing two different implementations for NA. I only need one, so
having two is redundant and confusing. Of these two, the bit-pattern
one has lower memory overhead (which many people have spoken up to say
matters to them), and really obvious semantics (assignment is
implemented as assignment, etc.). So why force people to make this
confusing choice? What does the mask implementation add? AFAICT, its
only purpose is to satisfy a rather different set of use cases. (See
Gary Strangman's email here for a good description of these use cases:
http://www.mail-archive.com/numpy-discussion@scipy.org/msg32385.html)
But AFAICT again, it's been crippled for those use cases in order to
give it the NA semantics. So I just don't see who the masking part is
supposed to help.

BTW, you can't 

Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Nathaniel Smith
On Fri, Jul 1, 2011 at 9:29 AM, Benjamin Root  wrote:
> On Fri, Jul 1, 2011 at 11:20 AM, Matthew Brett 
> wrote:
>> On Fri, Jul 1, 2011 at 5:17 PM, Benjamin Root  wrote:
>> > For more complicated functions like pcolor() and contour(), the arrays
>> > needs
>> > to know what the status of the neighboring points in itself, and for the
>> > other arrays.  Again, either we use numpy.ma to share a common mask
>> > across
>> > the data arrays, or we implement our own semantics to deal with this.
>> > And
>> > again, we can not change any of the original data.
>> >
>> > This is not an obscure case.  This is existing code in matplotlib.  I
>> > will
>> > be evaluating the current missingdata branch later today to assess its
>> > suitability for use in matplotlib.
>>
>> I think I missed why your case needs NA and IGNORE to use the same
>> API.  Why can't you just use masks and IGNORE here?
>
> The point is that matplotlib can not make assumptions about the nature of
> the input data.  From matplotlib's perspective, NA's and IGNORE's are the
> same thing and should be treated the same way (i.e. - skipped).  Right now,
> matplotlib's code is messy and inconsistent with its treatment of masked
> arrays and NaNs (some functions treat them the same, some only apply to NaNs
> and vice versa).  This is because of code cruft over the years.  If we had
> one interface to rule them all, we can bring *all* plotting functions to
> have similar handling code and be more consistent across the board.

Maybe I'm missing something, but it seems like no matter how the NA
handling thing plays out, what you need is something like

# For current numpy:
def usable_points(a):
a = np.asanyarray(a)
usable = ~np.isnan(a)
usable &= ~np.isinf(a)
if isinstance(a, np.ma.masked_array):
usable &= ~a.mask
return usable

def all_usable(a, *rest):
usable = usable_points(a)
for other in rest:
usable &= usable_points(other)
return usable

And then you need to call all_usable from each of your plotting
functions and away you go, yes?

AFAICT, under the NEP proposal, in usable_points() you need to add a line like:
  usable &= ~np.isna(a)  # NEP

Under the alterNEP proposal, you need to add two lines, like
  usable &= ~np.isna(a)  # alterNEP
  usable &= a.visible# alterNEP

And either way, once you get your mask, you pretty much do the same
thing: either use it directly, or use it to set up a masked array (of
whatever flavor, and they all seem to work the same as far as this is
concerned).

You seem to see some way in which the alterNEP's separation of masks
and NA handling makes a big difference to your architecture, but I'm
not getting it :-(.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Nathaniel Smith
On Fri, Jul 1, 2011 at 9:18 AM, Bruce Southey  wrote:
> I am sorry that that is NOT true - DON'T just lump every one into this
> when they have clearly stated the opposite! Missing values are nothing
> special to me, just reality. There are many statistical applications
> where masking is extremely common like outlier detection and flagging
> unusual observations (missing values is also masking). Just that you as
> a user have to do that yourself by creating and maintaining working
> variables.

Thanks for speaking up -- we all definitely want something that will
work as well as possible for everyone! I'm a little confused about
what you're saying, though -- I assume that you mean that you're happy
with the NEP proposal for handling NA values[1], and so I
misrepresented you when I said that everyone doing statistics with
missing values had concerns about the NEP? If so, then my apologies.

[1] 
https://github.com/m-paradox/numpy/blob/4afdb2768c4bb8cfe47c21154c4c8ca5f85e41aa/doc/neps/c-masked-array.rst

> I really find that you are 'splitting hairs' in your arguments as it
> really has to be up to the application on how missing values and NaN
> have to be handled. I see no difference between a missing value and a
> NaN because in virtually all statistical applications, both of these are
> dropped. This is what SAS typically does although certain procedure like
> FREQ allow you to treat missing values as 'valid'. R has slightly more
> flexibility since it differentiates missing valves and NaN. R allows you
> to decide how missing values are handled using arguments like na.rm or
> using na.fail, na.omit, na.exclude, na.pass functions.  But I think for
> the majority of cases (I'm not an R guru), R acts the same way as, by
> default (which is how most people use R) R excludes missing values and
> NaN's.

Is your point here that NA and NaN are pretty similar, so it's
splitting hairs to differentiate them? They are pretty similar, but
this is the justification I wrote for having both in the alterNEP
(https://gist.github.com/1056379):

"For floating point computations, NAs and NaNs have (almost?)
identical behavior. But they represent different things -- NaN an
invalid computation like 0/0, NA a value that is not available -- and
distinguishing between these things is useful because in some
situations they should be treated differently. (For example, an
imputation procedure should replace NAs with imputed values, but
probably should leave NaNs alone.) And anyway, we can't use NaNs for
integers, or strings, or booleans, so we need NA anyway, and once we
have NA support for all these types, we might as well support it for
floating point too for consistency."

Does that seem reasonable?

In any case, my arguments haven't really been about NA versus NaN --
everyone seems to agree that we want something like NA. In the NEP
proposal, there are two different versions of NAs, one that's
implemented using special values (e.g., a special NaN that means NA)
and one that's implemented by using a secondary mask array. My
argument has been that for people who just want NAs, this secondary
mask version is redundant and confusing; but the mask version doesn't
really help the people who want "masked arrays" either, because it's
working too hard to be compatible with NAs, and the masked array
people want different behavior (unmasking, automatic skipping of NAs,
etc.). So it doesn't really work well for anybody.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Pierre GM
On Jul 1, 2011 7:14 PM, "Mark Wiebe"  wrote:
>
> On Fri, Jul 1, 2011 at 10:15 AM, Nathaniel Smith  wrote:
>>
>> On Fri, Jul 1, 2011 at 7:09 AM, Mark Wiebe  wrote:
>> > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett 
>> > wrote:
>> >> Do you see problems with the alterNEP proposal?
>> >
>> > Yes, I really like my design as it stands now, and the alterNEP removes
a
>> > lot of the abstraction and interoperability that are in my opinion the
best
>> > parts. I've made more updates to the NEP based on continuing feedback,
which
>> > are part of the pull request I want reviews for.
>> >
>> >>
>> >> If so, what are they?
>> >
>> > Mainly: Reduced interoperability, more complex implementation (leading
to
>> > more bugs), and an unclear theoretical model for the masked part of it.
>>
>> Can you give any examples of situations where one would run into this
>> "reduced interoperability"? I'm not sure what it means. The only
>> person who has so far spoken up as needing both masking semantics and
>> NA semantics -- Gary Strangman -- has said that he strongly prefers
>> the alterNEP semantics *exactly because* it makes it clear *how these
>> functions will interoperate.*
>
>
> I've given examples before, but here are a few:
>
> 1) You're using NA dtypes. You realize you want multiple views of the same
data with different choices of NA. You switch to masked arrays with a few
lines of code changes.

Multiple NAs? AFAIU, there's only one NA (per type) but several choices to
allocate a IGNORE depending on the situation.

> 2) You're using masks. You realize that you will save memory/disk space if
you switch to NA dtypes, and it's possible because it turned out that while
you thought you would need masking, you came up with a new algorithm that
didn't require it.

Ok, your IGNOREs become N'as because you want to...

> 3) You're writing matplotlib, and you want to support all forms of
NA-style data. You write it once instead of twice. Repeat for all other open
source libraries that want to do this.

You switch your NAs to IGNOREs, ok, your call again.

>
>>
>> Can you give any examples of how the implementation would be more
>> complicated? As far as I can tell there are no elements in the
>> alterNEP that are not in your NEP, they mostly just expose the
>> functionality differently at the top level.
>
>
> If that is the case, then it should be easy to change to your model after
the implementation is complete. I'm happy with that, these style of design
choices are easier to make when you're comparing actual usage than
hypotheticals.
>
>> Do you have a clearer theoretical model for the masked part of your
>> proposal?
>
>
> Yes, exactly the same model used for NA dtypes.
>
>>
>> The best I've been able to extract from any of your messages
>> is when you wrote "it seems to me that people wanting masked arrays
>> want missing data without touching their data". But as a matter of
>> English grammar, I have no idea what this means -- if you have data,
>> it's not missing!
>
>
> Ok, missing data-like functionality, which is provided by the solid theory
behind the missing data.

Which is a subset of 'masked/to ignore' data...

>>
>> It seems to me that people wanting masked data want
>> to *hide* parts of their data, which seems much clearer to me and is
>> the theoretical model used in the alterNEP.
>
>
> Once you've hidden it, isn't it now missing?

Only temporarily, you can revert to not hidden when needed.
If a data is flagged as NA, it should never be accessible again.

>>
>> Note that this model
>> actually predicts several of the differences between how people want
>> masks to work and how people want NAs to work (e.g., their behavior
>> during reduction); I
>
>
>>
>> >> Do you agree that the alterNEP proposal is easier to understand?
>> >
>> > No.
>> >>
>> >> If not, can you explain why?
>> >
>> > My answers to that are already scattered in the emails in various
places,
>> > and in the various rationales and justifications provided in the NEP.
>>
>> I understand the desire not to get caught up in spending all your time
>> writing emails explaining things that you feel like you've already
>> explained.
>>
>> Maybe there's an email I missed somewhere where you explain the
>> conceptual model behind your NEP's semantics in a short,
>> easy-to-understand way (comparable to, say, the Rationale section of
>> the alterNEP). But I haven't seen it and I can't reconstruct a
>> rationale for it myself (the alterNEP comes out of my attempts to do
>> so!).
>
>
> I've been repeatedly updating the NEP. In particular this "round 2" email
was an attempt to clarify between the two missing data models (what's being
called NA and IGNORE), and the two implementation techniques (NA bit
patterns and masks). I've argued that these are completely independent from
each other.
>
>>
>> >> What do you see as the important points of difference between the NEP
>> >> and the alterNEP?
>> >
>> > The biggest thing is the NEP supports more use cases in a clea

Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Lluís
Nathaniel Smith writes:

> On Fri, Jul 1, 2011 at 7:09 AM, Mark Wiebe  wrote:
>> On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett 
>> wrote:
>>> Do you see problems with the alterNEP proposal?
>> 
>> Yes, I really like my design as it stands now, and the alterNEP removes a
>> lot of the abstraction and interoperability that are in my opinion the best
>> parts. I've made more updates to the NEP based on continuing feedback, which
>> are part of the pull request I want reviews for.
>> 
>>> 
>>> If so, what are they?
>> 
>> Mainly: Reduced interoperability, more complex implementation (leading to
>> more bugs), and an unclear theoretical model for the masked part of it.

> Can you give any examples of situations where one would run into this
> "reduced interoperability"? I'm not sure what it means. The only
> person who has so far spoken up as needing both masking semantics and
> NA semantics -- Gary Strangman -- has said that he strongly prefers
> the alterNEP semantics *exactly because* it makes it clear *how these
> functions will interoperate.*

Interoperability improves code maintenance, see my other mail.


[...]
> Do you have a clearer theoretical model for the masked part of your
> proposal? The best I've been able to extract from any of your messages
> is when you wrote "it seems to me that people wanting masked arrays
> want missing data without touching their data". But as a matter of
> English grammar, I have no idea what this means -- if you have data,
> it's not missing! It seems to me that people wanting masked data want
> to *hide* parts of their data, which seems much clearer to me and is
> the theoretical model used in the alterNEP. Note that this model
> actually predicts several of the differences between how people want
> masks to work and how people want NAs to work (e.g., their behavior
> during reduction); I

Come on, let's not jump into each other's throats, I think we've long
ago arrived at a point where we all know what masked means.

If you agree on the interoperability point, then I don't see how the
aNEP improves on that, having in mind that masks must be *explicitly*
activated (again, see the other mail).


[...]
> Well, that's not true. There are some marginal advantages in the
> special case of working with integers+NAs. But I don't think anyone's
> making that argument.

I for one would love that, instead of having to explicitly set dtypes
when using genfromtxt.


[...]
> But as far as I can tell right now, every single person who has
> experience with handling missing data for statistical purposes (esp.
> in R) has real concerns about your proposal, and AFAICT the community
> has very much *not* reached consensus on how these features should
> look.

What I have seen is that people used to R see the mask concept as an
alien, and said "I don't want to use it, so please make it more explicit
so that I will know what to avoid". What I say is that you simply don't
have to make np.IGNORE explicit to avoid masks. Simply do not create
arrays with masks.


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Lluís
Matthew Brett writes:
>>> > Mainly: Reduced interoperability
>>> 
>>> Meaning?
>> 
>> You can't switch between the two approaches without big changes in your
>> code.

> Lluis provided a case, and it was obscure.  That switch seems like a
> rare or non-existent use-case that should not guide the API.

The example was for an outlier detection *in-place*.

I see the merged API as beneficial in cases where:

* There are arguments used both as input *and* output (w.r.t. missing
  data information), and it is up to the *caller* to decide whether to
  also maintain the original data. That is, with a merged API, the
  caller can retain a "copy" - a view in fact - of its original data
  more efficiently.

  In the matplotlib case, the outlier detection caller might decide to
  pass a brand new array copy, so then the outlier detection is then
  implemented using np.NA (as they are both developed inside the same
  framework).

  But it may also be the case that later on, the developer decides to
  rewrite the caller function (for whatever reason, like avoiding a full
  copy of the array) as passing an array with masking activated.

  With the merged API the outlier detection will still work
  perfectly. With np.IGNORE the outlier detection code should also be
  changed.

  This is what Mark talks about when saying "interoperability", and it
  is a good choice from the point of view of code maintenance.

* Propagation of np.NA and np.IGNORE are controlled with a single
  argument (thus simpler and less error-prone code), as opposed to two
  separate arguments and two possible outcomes (np.NA and np.IGNORE)
  with aNEP.

I have been repeating these 2 points again and again, and I still feel
they have not yet been addressed by the aNEP.


Still, the only clear statement I've seen in favour of the aNEP is
minimizing "surprises".

And I will repeat it again. You have to *explicitly* "activate" masks,
just as well as you *explicitly* use np.IGNORE, so it should not
surprise you when you see a mask-like behaviour, precisely because you
have asked for it.

If you don't want that behaviour, you simply don't activate masks.


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Mark Wiebe
On Fri, Jul 1, 2011 at 10:15 AM, Nathaniel Smith  wrote:

> On Fri, Jul 1, 2011 at 7:09 AM, Mark Wiebe  wrote:
> > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett 
> > wrote:
> >> Do you see problems with the alterNEP proposal?
> >
> > Yes, I really like my design as it stands now, and the alterNEP removes a
> > lot of the abstraction and interoperability that are in my opinion the
> best
> > parts. I've made more updates to the NEP based on continuing feedback,
> which
> > are part of the pull request I want reviews for.
> >
> >>
> >> If so, what are they?
> >
> > Mainly: Reduced interoperability, more complex implementation (leading to
> > more bugs), and an unclear theoretical model for the masked part of it.
>
> Can you give any examples of situations where one would run into this
> "reduced interoperability"? I'm not sure what it means. The only
> person who has so far spoken up as needing both masking semantics and
> NA semantics -- Gary Strangman -- has said that he strongly prefers
> the alterNEP semantics *exactly because* it makes it clear *how these
> functions will interoperate.*
>

I've given examples before, but here are a few:

1) You're using NA dtypes. You realize you want multiple views of the same
data with different choices of NA. You switch to masked arrays with a few
lines of code changes.
2) You're using masks. You realize that you will save memory/disk space if
you switch to NA dtypes, and it's possible because it turned out that while
you thought you would need masking, you came up with a new algorithm that
didn't require it.
3) You're writing matplotlib, and you want to support all forms of NA-style
data. You write it once instead of twice. Repeat for all other open source
libraries that want to do this.


> Can you give any examples of how the implementation would be more
> complicated? As far as I can tell there are no elements in the
> alterNEP that are not in your NEP, they mostly just expose the
> functionality differently at the top level.
>

If that is the case, then it should be easy to change to your model after
the implementation is complete. I'm happy with that, these style of design
choices are easier to make when you're comparing actual usage than
hypotheticals.

Do you have a clearer theoretical model for the masked part of your
> proposal?


Yes, exactly the same model used for NA dtypes.


> The best I've been able to extract from any of your messages
> is when you wrote "it seems to me that people wanting masked arrays
> want missing data without touching their data". But as a matter of
> English grammar, I have no idea what this means -- if you have data,
> it's not missing!


Ok, missing data-like functionality, which is provided by the solid theory
behind the missing data.


> It seems to me that people wanting masked data want
> to *hide* parts of their data, which seems much clearer to me and is
> the theoretical model used in the alterNEP.


Once you've hidden it, isn't it now missing?


> Note that this model
> actually predicts several of the differences between how people want
> masks to work and how people want NAs to work (e.g., their behavior
> during reduction); I
>


> >> Do you agree that the alterNEP proposal is easier to understand?
> >
> > No.
> >>
> >> If not, can you explain why?
> >
> > My answers to that are already scattered in the emails in various places,
> > and in the various rationales and justifications provided in the NEP.
>
> I understand the desire not to get caught up in spending all your time
> writing emails explaining things that you feel like you've already
> explained.
>
> Maybe there's an email I missed somewhere where you explain the
> conceptual model behind your NEP's semantics in a short,
> easy-to-understand way (comparable to, say, the Rationale section of
> the alterNEP). But I haven't seen it and I can't reconstruct a
> rationale for it myself (the alterNEP comes out of my attempts to do
> so!).
>

I've been repeatedly updating the NEP. In particular this "round 2" email
was an attempt to clarify between the two missing data models (what's being
called NA and IGNORE), and the two implementation techniques (NA bit
patterns and masks). I've argued that these are completely independent from
each other.


> >> What do you see as the important points of difference between the NEP
> >> and the alterNEP?
> >
> > The biggest thing is the NEP supports more use cases in a clean way by
> > composition of different simpler components. It defines one clear missing
> > data abstraction, and proposes two implementations that are
> interchangeable
> > and can interoperate.
>
> But the two implementations in your proposal are not interchangeable!
> The whole justification for starting with a masked-based
> implementation in your proposal is that it supports unmasking via
> views; if that requirement were removed, then there would be no reason
> to bother with the masking-based implementation at all.
>

They are interchangeable 100% with

Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Christopher Jordan-Squire
This is kind of late to be jumping into the 'long thread of doom', but I've
been following most of the posts, so I'd figured I'd throw in my 2 cents.
I'm Mark's officemate over the summer, and we've been talking daily about
his design. I was skeptical of various details at first, but by now Mark's
largely sold me on his design. Though, FWIW, my background is largely
statistical uses of arrays rather than scientific uses, so I grok missing
data usage more naturally than masking.

On Fri, Jul 1, 2011 at 10:15 AM, Nathaniel Smith  wrote:

> On Fri, Jul 1, 2011 at 7:09 AM, Mark Wiebe  wrote:
> > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett 
> > wrote:
> >> Do you see problems with the alterNEP proposal?
> >
> > Yes, I really like my design as it stands now, and the alterNEP removes a
> > lot of the abstraction and interoperability that are in my opinion the
> best
> > parts. I've made more updates to the NEP based on continuing feedback,
> which
> > are part of the pull request I want reviews for.
> >
> >>
> >> If so, what are they?
> >
> > Mainly: Reduced interoperability, more complex implementation (leading to
> > more bugs), and an unclear theoretical model for the masked part of it.
>
> Can you give any examples of situations where one would run into this
> "reduced interoperability"? I'm not sure what it means. The only
> person who has so far spoken up as needing both masking semantics and
> NA semantics -- Gary Strangman -- has said that he strongly prefers
> the alterNEP semantics *exactly because* it makes it clear *how these
> functions will interoperate.*
>
> Can you give any examples of how the implementation would be more
> complicated? As far as I can tell there are no elements in the
> alterNEP that are not in your NEP, they mostly just expose the
> functionality differently at the top level.
>
> Do you have a clearer theoretical model for the masked part of your
> proposal? The best I've been able to extract from any of your messages
> is when you wrote "it seems to me that people wanting masked arrays
> want missing data without touching their data". But as a matter of
> English grammar, I have no idea what this means -- if you have data,
> it's not missing! It seems to me that people wanting masked data want
> to *hide* parts of their data, which seems much clearer to me and is
> the theoretical model used in the alterNEP. Note that this model
> actually predicts several of the differences between how people want
> masks to work and how people want NAs to work (e.g., their behavior
> during reduction); I
>
>
I looked over the theoretical mode in the aNEP, and I disagree with it. I
think a masked array is just that: an array with a mask. Do whatever with
the mask, but it's up to the user to decide how they want to use it. It
doesn't seem like it has to come with a theoretical model. (Unlike missing
data, which comes which does have a nice theoretical model.)

The theoretical model in the aNEP seems to assume too much. I'm thinking in
particular of this idea: "a length-4 array in which the last value has been
masked out behaves just like an ordinary length-3 array, so long as you
don't change the mask." That's forcing a notion of column/position
independence on the masked array, in that any function operating on the rows
must treat each column the same. And I'm don't think that's part of the
contract that should come from creating a masked array.


>> Do you agree that the alterNEP proposal is easier to understand?
> >
> > No.
> >>
> >> If not, can you explain why?
> >
> > My answers to that are already scattered in the emails in various places,
> > and in the various rationales and justifications provided in the NEP.
>
> I understand the desire not to get caught up in spending all your time
> writing emails explaining things that you feel like you've already
> explained.
>
> Maybe there's an email I missed somewhere where you explain the
> conceptual model behind your NEP's semantics in a short,
> easy-to-understand way (comparable to, say, the Rationale section of
> the alterNEP). But I haven't seen it and I can't reconstruct a
> rationale for it myself (the alterNEP comes out of my attempts to do
> so!).
>
> >> What do you see as the important points of difference between the NEP
> >> and the alterNEP?
> >
> > The biggest thing is the NEP supports more use cases in a clean way by
> > composition of different simpler components. It defines one clear missing
> > data abstraction, and proposes two implementations that are
> interchangeable
> > and can interoperate.
>
> But the two implementations in your proposal are not interchangeable!
> The whole justification for starting with a masked-based
> implementation in your proposal is that it supports unmasking via
> views; if that requirement were removed, then there would be no reason
> to bother with the masking-based implementation at all.
>
> Well, that's not true. There are some marginal advantages in the
> special case of working with int

Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Benjamin Root
On Fri, Jul 1, 2011 at 11:20 AM, Matthew Brett wrote:

> Hi,
>
> On Fri, Jul 1, 2011 at 5:17 PM, Benjamin Root  wrote:
> >
> >
> > On Fri, Jul 1, 2011 at 11:00 AM, Matthew Brett 
> > wrote:
> >>
> >> > You can't switch between the two approaches without big changes in
> your
> >> > code.
> >>
> >> >
> >> Lluis provided a case, and it was obscure.  That switch seems like a
> >> rare or non-existent use-case that should not guide the API.
> >>
> >
> > Just to respond to this specific issue.
> >
> > In matplotlib, there are often constructs like the following:
> >
> > plot_something(X, Y, V)
> >
> > From a module perspective, we have no clue about the nature of the input
> > data.  We often have to do things like np.asanyarray, np.atleast_2d and
> such
> > to establish some base-level assumptions about the input data.  Numpy
> > currently makes this fairly cheap by not performing a copy if it is not
> > needed.  So far, so good.
> >
> > Next, some plotting functions needs to broadcast the arrays together
> (again,
> > numpy makes that fairly cheap).
> >
> > Then, we need to figure out the common elements to plot.  With something
> > simple like plot(), this is straight-forward or-ing of any masks.  Of
> > course, right now, this is not cheap because we can't assume that the
> array
> > supports masking semantics.  This is where we either cast the arrays as
> > masked arrays, or perform our own masking semantics.  But, essentially, a
> > point that was masked in X, may not be masked in Y and/or V, and we can
> not
> > change the original data (or else we would be a bad tool).
> >
> > For more complicated functions like pcolor() and contour(), the arrays
> needs
> > to know what the status of the neighboring points in itself, and for the
> > other arrays.  Again, either we use numpy.ma to share a common mask
> across
> > the data arrays, or we implement our own semantics to deal with this.
> And
> > again, we can not change any of the original data.
> >
> > This is not an obscure case.  This is existing code in matplotlib.  I
> will
> > be evaluating the current missingdata branch later today to assess its
> > suitability for use in matplotlib.
>
> I think I missed why your case needs NA and IGNORE to use the same
> API.  Why can't you just use masks and IGNORE here?
>
> Best,
>
> Matthew
>

The point is that matplotlib can not make assumptions about the nature of
the input data.  From matplotlib's perspective, NA's and IGNORE's are the
same thing and should be treated the same way (i.e. - skipped).  Right now,
matplotlib's code is messy and inconsistent with its treatment of masked
arrays and NaNs (some functions treat them the same, some only apply to NaNs
and vice versa).  This is because of code cruft over the years.  If we had
one interface to rule them all, we can bring *all* plotting functions to
have similar handling code and be more consistent across the board.

However, I think Mark's NEP provides a good way to distinguish between the
cases when needed (but I have not examined it from that perspective yet).

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Matthew Brett
Hi,

On Fri, Jul 1, 2011 at 5:18 PM, Bruce Southey  wrote:
> On 07/01/2011 10:15 AM, Nathaniel Smith wrote:

> I really find that you are 'splitting hairs' in your arguments as it
> really has to be up to the application on how missing values and NaN
> have to be handled. I see no difference between a missing value and a
> NaN because in virtually all statistical applications, both of these are
> dropped.

The argument is that NA and IGNORE are conceptually different and
should have a separate API.

That if you don't, it will be confusing.

By default, in alterNEP, NAs propagate and masked values are ignored.
If you want to treat them just the same, then that's an argument to
your ufunc.  Or use an 'isvalid' utility function.

Do you have a concrete case where making NA and IGNORE the same thing
in the API, gives some benefit?

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Matthew Brett
Hi,

On Fri, Jul 1, 2011 at 5:17 PM, Benjamin Root  wrote:
>
>
> On Fri, Jul 1, 2011 at 11:00 AM, Matthew Brett 
> wrote:
>>
>> > You can't switch between the two approaches without big changes in your
>> > code.
>>
>> >
>> Lluis provided a case, and it was obscure.  That switch seems like a
>> rare or non-existent use-case that should not guide the API.
>>
>
> Just to respond to this specific issue.
>
> In matplotlib, there are often constructs like the following:
>
> plot_something(X, Y, V)
>
> From a module perspective, we have no clue about the nature of the input
> data.  We often have to do things like np.asanyarray, np.atleast_2d and such
> to establish some base-level assumptions about the input data.  Numpy
> currently makes this fairly cheap by not performing a copy if it is not
> needed.  So far, so good.
>
> Next, some plotting functions needs to broadcast the arrays together (again,
> numpy makes that fairly cheap).
>
> Then, we need to figure out the common elements to plot.  With something
> simple like plot(), this is straight-forward or-ing of any masks.  Of
> course, right now, this is not cheap because we can't assume that the array
> supports masking semantics.  This is where we either cast the arrays as
> masked arrays, or perform our own masking semantics.  But, essentially, a
> point that was masked in X, may not be masked in Y and/or V, and we can not
> change the original data (or else we would be a bad tool).
>
> For more complicated functions like pcolor() and contour(), the arrays needs
> to know what the status of the neighboring points in itself, and for the
> other arrays.  Again, either we use numpy.ma to share a common mask across
> the data arrays, or we implement our own semantics to deal with this.  And
> again, we can not change any of the original data.
>
> This is not an obscure case.  This is existing code in matplotlib.  I will
> be evaluating the current missingdata branch later today to assess its
> suitability for use in matplotlib.

I think I missed why your case needs NA and IGNORE to use the same
API.  Why can't you just use masks and IGNORE here?

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Bruce Southey
On 07/01/2011 10:15 AM, Nathaniel Smith wrote:
> On Fri, Jul 1, 2011 at 7:09 AM, Mark Wiebe  wrote:
>> On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett
>> wrote:
>>> Do you see problems with the alterNEP proposal?
>> Yes, I really like my design as it stands now, and the alterNEP removes a
>> lot of the abstraction and interoperability that are in my opinion the best
>> parts. I've made more updates to the NEP based on continuing feedback, which
>> are part of the pull request I want reviews for.
>>
>>> If so, what are they?
>> Mainly: Reduced interoperability, more complex implementation (leading to
>> more bugs), and an unclear theoretical model for the masked part of it.
> Can you give any examples of situations where one would run into this
> "reduced interoperability"? I'm not sure what it means. The only
> person who has so far spoken up as needing both masking semantics and
> NA semantics -- Gary Strangman -- has said that he strongly prefers
> the alterNEP semantics *exactly because* it makes it clear *how these
> functions will interoperate.*
>
> Can you give any examples of how the implementation would be more
> complicated? As far as I can tell there are no elements in the
> alterNEP that are not in your NEP, they mostly just expose the
> functionality differently at the top level.
>
> Do you have a clearer theoretical model for the masked part of your
> proposal? The best I've been able to extract from any of your messages
> is when you wrote "it seems to me that people wanting masked arrays
> want missing data without touching their data". But as a matter of
> English grammar, I have no idea what this means -- if you have data,
> it's not missing! It seems to me that people wanting masked data want
> to *hide* parts of their data, which seems much clearer to me and is
> the theoretical model used in the alterNEP. Note that this model
> actually predicts several of the differences between how people want
> masks to work and how people want NAs to work (e.g., their behavior
> during reduction); I
>
>>> Do you agree that the alterNEP proposal is easier to understand?
>> No.
>>> If not, can you explain why?
>> My answers to that are already scattered in the emails in various places,
>> and in the various rationales and justifications provided in the NEP.
> I understand the desire not to get caught up in spending all your time
> writing emails explaining things that you feel like you've already
> explained.
>
> Maybe there's an email I missed somewhere where you explain the
> conceptual model behind your NEP's semantics in a short,
> easy-to-understand way (comparable to, say, the Rationale section of
> the alterNEP). But I haven't seen it and I can't reconstruct a
> rationale for it myself (the alterNEP comes out of my attempts to do
> so!).
>
>>> What do you see as the important points of difference between the NEP
>>> and the alterNEP?
>> The biggest thing is the NEP supports more use cases in a clean way by
>> composition of different simpler components. It defines one clear missing
>> data abstraction, and proposes two implementations that are interchangeable
>> and can interoperate.
> But the two implementations in your proposal are not interchangeable!
> The whole justification for starting with a masked-based
> implementation in your proposal is that it supports unmasking via
> views; if that requirement were removed, then there would be no reason
> to bother with the masking-based implementation at all.
>
> Well, that's not true. There are some marginal advantages in the
> special case of working with integers+NAs. But I don't think anyone's
> making that argument.
>
>> The alterNEP proposes two independent APIs, reducing
>> interoperability and so significantly increasing the amount of learning
>> required to work with both of them. This also precludes switching between
>> the two approaches without a lot of work.
> You can't switch between Python and C without a lot of work too, but
> that doesn't mean that they should be merged into one design... but
> they do complement each other beautifully. Just like missing data and
> masked arrays :-).
>
>> The current pull request that's sitting there waiting for review does not
>> have an impact on which approach goes ahead, but the code I'm doing now
>> does. This is a fairly large project, and I don't have a great length of
>> time to do it in, so I'm not going to participate extensively in the
>> alterNEP discussion. If you want to help me, please review my code and
>> provide specific feedback on my NEP (the code review system in github is
>> great for this too, I've received some excellent feedback on the NEP that
>> way). If you want to change my mind about things, please address the
>> specific design decisions you think are problematic by specifically
>> responding to lines in the NEP, as part of code-reviewing my pull request in
>> github.
> I know I'm being grumpy in this email, and I apologize for that. But,
> no. I've given extensive fe

Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Matthew Brett
Hi,

On Fri, Jul 1, 2011 at 5:15 PM, Charles R Harris
 wrote:
>
>
> On Fri, Jul 1, 2011 at 10:00 AM, Matthew Brett 
> wrote:
>>
>> Hi,
>>
>> On Fri, Jul 1, 2011 at 4:34 PM, Mark Wiebe  wrote:
>> > On Fri, Jul 1, 2011 at 9:50 AM, Matthew Brett 
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Fri, Jul 1, 2011 at 3:09 PM, Mark Wiebe  wrote:
>> >> > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett
>> >> > 
>> >> > wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> On Fri, Jul 1, 2011 at 2:36 AM, Keith Goodman 
>> >> >> wrote:
>> >> >> > On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith 
>> >> >> > wrote:
>> >> >> >> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett
>> >> >> >>  wrote:
>> >> >> >>> In the interest of making the discussion as concrete as
>> >> >> >>> possible,
>> >> >> >>> here
>> >> >> >>> is my draft of an alternative proposal for NAs and masking,
>> >> >> >>> based
>> >> >> >>> on
>> >> >> >>> Nathaniel's comments.  Writing it, it seemed to me that
>> >> >> >>> Nathaniel
>> >> >> >>> is
>> >> >> >>> right, that the ideas become much clearer when the NA idea and
>> >> >> >>> the
>> >> >> >>> MASK idea are separate.   Please do pitch in for things I may
>> >> >> >>> have
>> >> >> >>> missed or misunderstood:
>> >> >> >> [...]
>> >> >> >>
>> >> >> >> Thanks for writing this up! I stuck it up as a gist so we can
>> >> >> >> edit
>> >> >> >> it
>> >> >> >> more easily:
>> >> >> >>  https://gist.github.com/1056379/
>> >> >> >> This is your initial version:
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>  https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191
>> >> >> >> And I made a few changes:
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>  https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583
>> >> >> >> Specifically, I added a rationale section, changed np.MASKED to
>> >> >> >> np.IGNORE (as per comments in this thread), and added a vowel to
>> >> >> >> "propmsk".
>> >> >> >
>> >> >> > It might be helpful to make a small toy class in python so that
>> >> >> > people
>> >> >> > can play around with NA and IGNORE from the alterNEP.
>> >> >>
>> >> >> Thanks for doing this.
>> >> >>
>> >> >> I don't know about you, but I don't know where to work on the
>> >> >> discussion or draft implementation, because I am not sure where the
>> >> >> disagreement is.  Lluis has helpfully pointed out a specific case of
>> >> >> interest.   Pierre has fed back with some points of clarification.
>> >> >> However, other than that, I'm not sure what we should be discussing.
>> >> >>
>> >> >> @Mark
>> >> >> @Chuck
>> >> >> @anyone
>> >> >>
>> >> >> Do you see problems with the alterNEP proposal?
>> >> >
>> >> > Yes, I really like my design as it stands now, and the alterNEP
>> >> > removes
>> >> > a
>> >> > lot of the abstraction and interoperability that are in my opinion
>> >> > the
>> >> > best
>> >> > parts. I've made more updates to the NEP based on continuing
>> >> > feedback,
>> >> > which
>> >> > are part of the pull request I want reviews for.
>> >>
>> >> Ah - I think what you are saying is - too late I've started writing it.
>> >
>> > Do you want me to spend my whole summer designing something before
>> > starting
>> > the implementation?
>>
>> No, but, this is an open source project.  Hence it matters not only
>> what gets written but how the decisions are made and quality of the
>> discussion.   Here what I see is that you lost interest in the
>> discussion some time ago and stopped responding in any specific way.
>> This unfortunately conveys a lack of interest in our views.   That
>> might not be true, in which case I'm sure you can convey the opposite
>> with some substantial discsussion now.  Or it might be for good
>> reason, heaven knows I've been wrong enough times.  But the community
>> cost is high for the sake of an extra few days implementation time.
>> Frankly I think the API will also suffer, but I'm less certain about
>> that.
>
> What open source has trouble with isn't discussion, it's attracting active
> and competent developers. You should treat them as gifts from the $deity
> when they show up. If they are open and responsive to discussion, and I
> think Mark is, so much the better. Mind, you don't need to bow down and kiss
> their feet, but you should at least take the time to understand what they
> are doing so your criticisms and feedback are informed.

Are you now going to explain why you believe our criticisms and
feedback are not well informed?

See you,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Benjamin Root
On Fri, Jul 1, 2011 at 11:00 AM, Matthew Brett wrote:

>
> > You can't switch between the two approaches without big changes in your
> > code.
>
> >
> Lluis provided a case, and it was obscure.  That switch seems like a
> rare or non-existent use-case that should not guide the API.
>
>
Just to respond to this specific issue.

In matplotlib, there are often constructs like the following:

plot_something(X, Y, V)

>From a module perspective, we have no clue about the nature of the input
data.  We often have to do things like np.asanyarray, np.atleast_2d and such
to establish some base-level assumptions about the input data.  Numpy
currently makes this fairly cheap by not performing a copy if it is not
needed.  So far, so good.

Next, some plotting functions needs to broadcast the arrays together (again,
numpy makes that fairly cheap).

Then, we need to figure out the common elements to plot.  With something
simple like plot(), this is straight-forward or-ing of any masks.  Of
course, right now, this is not cheap because we can't assume that the array
supports masking semantics.  This is where we either cast the arrays as
masked arrays, or perform our own masking semantics.  But, essentially, a
point that was masked in X, may not be masked in Y and/or V, and we can not
change the original data (or else we would be a bad tool).

For more complicated functions like pcolor() and contour(), the arrays needs
to know what the status of the neighboring points in itself, and for the
other arrays.  Again, either we use numpy.ma to share a common mask across
the data arrays, or we implement our own semantics to deal with this.  And
again, we can not change any of the original data.

This is not an obscure case.  This is existing code in matplotlib.  I will
be evaluating the current missingdata branch later today to assess its
suitability for use in matplotlib.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Charles R Harris
On Fri, Jul 1, 2011 at 10:00 AM, Matthew Brett wrote:

> Hi,
>
> On Fri, Jul 1, 2011 at 4:34 PM, Mark Wiebe  wrote:
> > On Fri, Jul 1, 2011 at 9:50 AM, Matthew Brett 
> > wrote:
> >>
> >> Hi,
> >>
> >> On Fri, Jul 1, 2011 at 3:09 PM, Mark Wiebe  wrote:
> >> > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett <
> matthew.br...@gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> On Fri, Jul 1, 2011 at 2:36 AM, Keith Goodman 
> >> >> wrote:
> >> >> > On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith 
> >> >> > wrote:
> >> >> >> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett
> >> >> >>  wrote:
> >> >> >>> In the interest of making the discussion as concrete as possible,
> >> >> >>> here
> >> >> >>> is my draft of an alternative proposal for NAs and masking, based
> >> >> >>> on
> >> >> >>> Nathaniel's comments.  Writing it, it seemed to me that Nathaniel
> >> >> >>> is
> >> >> >>> right, that the ideas become much clearer when the NA idea and
> the
> >> >> >>> MASK idea are separate.   Please do pitch in for things I may
> have
> >> >> >>> missed or misunderstood:
> >> >> >> [...]
> >> >> >>
> >> >> >> Thanks for writing this up! I stuck it up as a gist so we can edit
> >> >> >> it
> >> >> >> more easily:
> >> >> >>  https://gist.github.com/1056379/
> >> >> >> This is your initial version:
> >> >> >>
> >> >> >>
> >> >> >>
> https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191
> >> >> >> And I made a few changes:
> >> >> >>
> >> >> >>
> >> >> >>
> https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583
> >> >> >> Specifically, I added a rationale section, changed np.MASKED to
> >> >> >> np.IGNORE (as per comments in this thread), and added a vowel to
> >> >> >> "propmsk".
> >> >> >
> >> >> > It might be helpful to make a small toy class in python so that
> >> >> > people
> >> >> > can play around with NA and IGNORE from the alterNEP.
> >> >>
> >> >> Thanks for doing this.
> >> >>
> >> >> I don't know about you, but I don't know where to work on the
> >> >> discussion or draft implementation, because I am not sure where the
> >> >> disagreement is.  Lluis has helpfully pointed out a specific case of
> >> >> interest.   Pierre has fed back with some points of clarification.
> >> >> However, other than that, I'm not sure what we should be discussing.
> >> >>
> >> >> @Mark
> >> >> @Chuck
> >> >> @anyone
> >> >>
> >> >> Do you see problems with the alterNEP proposal?
> >> >
> >> > Yes, I really like my design as it stands now, and the alterNEP
> removes
> >> > a
> >> > lot of the abstraction and interoperability that are in my opinion the
> >> > best
> >> > parts. I've made more updates to the NEP based on continuing feedback,
> >> > which
> >> > are part of the pull request I want reviews for.
> >>
> >> Ah - I think what you are saying is - too late I've started writing it.
> >
> > Do you want me to spend my whole summer designing something before
> starting
> > the implementation?
>
> No, but, this is an open source project.  Hence it matters not only
> what gets written but how the decisions are made and quality of the
> discussion.   Here what I see is that you lost interest in the
> discussion some time ago and stopped responding in any specific way.
> This unfortunately conveys a lack of interest in our views.   That
> might not be true, in which case I'm sure you can convey the opposite
> with some substantial discsussion now.  Or it might be for good
> reason, heaven knows I've been wrong enough times.  But the community
> cost is high for the sake of an extra few days implementation time.
> Frankly I think the API will also suffer, but I'm less certain about
> that.
>

What open source has trouble with isn't discussion, it's attracting active
and competent developers. You should treat them as gifts from the $deity
when they show up. If they are open and responsive to discussion, and I
think Mark is, so much the better. Mind, you don't need to bow down and kiss
their feet, but you should at least take the time to understand what they
are doing so your criticisms and feedback are informed.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Matthew Brett
Hi,

On Fri, Jul 1, 2011 at 4:48 PM, Charles R Harris
 wrote:
>
>
> On Fri, Jul 1, 2011 at 9:34 AM, Mark Wiebe  wrote:
>>
>> On Fri, Jul 1, 2011 at 9:50 AM, Matthew Brett 
>> wrote:
>>>
>>> Hi,
>>>
>>> On Fri, Jul 1, 2011 at 3:09 PM, Mark Wiebe  wrote:
>>> > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett 
>>> > wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> On Fri, Jul 1, 2011 at 2:36 AM, Keith Goodman 
>>> >> wrote:
>>> >> > On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith 
>>> >> > wrote:
>>> >> >> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett
>>> >> >>  wrote:
>>> >> >>> In the interest of making the discussion as concrete as possible,
>>> >> >>> here
>>> >> >>> is my draft of an alternative proposal for NAs and masking, based
>>> >> >>> on
>>> >> >>> Nathaniel's comments.  Writing it, it seemed to me that Nathaniel
>>> >> >>> is
>>> >> >>> right, that the ideas become much clearer when the NA idea and the
>>> >> >>> MASK idea are separate.   Please do pitch in for things I may have
>>> >> >>> missed or misunderstood:
>>> >> >> [...]
>>> >> >>
>>> >> >> Thanks for writing this up! I stuck it up as a gist so we can edit
>>> >> >> it
>>> >> >> more easily:
>>> >> >>  https://gist.github.com/1056379/
>>> >> >> This is your initial version:
>>> >> >>
>>> >> >>
>>> >> >>  https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191
>>> >> >> And I made a few changes:
>>> >> >>
>>> >> >>
>>> >> >>  https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583
>>> >> >> Specifically, I added a rationale section, changed np.MASKED to
>>> >> >> np.IGNORE (as per comments in this thread), and added a vowel to
>>> >> >> "propmsk".
>>> >> >
>>> >> > It might be helpful to make a small toy class in python so that
>>> >> > people
>>> >> > can play around with NA and IGNORE from the alterNEP.
>>> >>
>>> >> Thanks for doing this.
>>> >>
>>> >> I don't know about you, but I don't know where to work on the
>>> >> discussion or draft implementation, because I am not sure where the
>>> >> disagreement is.  Lluis has helpfully pointed out a specific case of
>>> >> interest.   Pierre has fed back with some points of clarification.
>>> >> However, other than that, I'm not sure what we should be discussing.
>>> >>
>>> >> @Mark
>>> >> @Chuck
>>> >> @anyone
>>> >>
>>> >> Do you see problems with the alterNEP proposal?
>>> >
>>> > Yes, I really like my design as it stands now, and the alterNEP removes
>>> > a
>>> > lot of the abstraction and interoperability that are in my opinion the
>>> > best
>>> > parts. I've made more updates to the NEP based on continuing feedback,
>>> > which
>>> > are part of the pull request I want reviews for.
>>>
>>> Ah - I think what you are saying is - too late I've started writing it.
>>
>> Do you want me to spend my whole summer designing something before
>> starting the implementation? I made a pull request implementing a
>> non-controversial part of the NEP to get started, and I've not seen any
>> feedback on except from Chuck and Derek. (Many thanks to Chuck and Derek!)
>> Implementation and design are tied together in a feedback loop, and separate
>> designs that aren't informed by the implementation details, for example
>> information gained by going through the proposed code changes and reviewing
>> them, are counterproductive. I appreciate the effort you're putting in, and
>> I've been trying to guide you towards a more holistic path of contribution
>> by pointing out the pull request.
>>>
>>> > Mainly: Reduced interoperability
>>>
>>> Meaning?
>>
>> You can't switch between the two approaches without big changes in your
>> code.
>>
>>>
>>> > more complex implementation (leading to
>>> > more bugs),
>>>
>>> OK - but the discussion did not seem to be about the complexity of the
>>> implementation, but about the API.
>>
>> The implementation always plays a role in the design of anything. Making
>> an API design abstractly, then testing it against implementation constraints
>> is good, making an API completely divorced from considerations of
>> implementation is really really bad.
>>
>>>
>>> > and an unclear theoretical model for the masked part of i
>>>
>>> What's unclear?  Or even different?
>>
>> After thinking about the missing data model some more, I've come up with
>> more rationale for why the R approach is good, and adopting both the R
>> default and skipna option is appropriate. It's in the pull request up for
>> code review.
>>
>>>
>>> >> Do you agree that the alterNEP proposal is easier to understand?
>>> >
>>> >
>>> > No.
>>>
>>> Do you agree that there are several people on the list who do thing
>>> that the alterNEP proposal is easier to understand?
>>
>> Feedback on the clarity of my writing in the NEP is welcome, if something
>> is unclear to someone, please point out the specific part so I can continue
>> to improve it. I don't think the clarity of the writing is a good reason for
>> choosing one design or another, the quality of the design is what should
>> dec

Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Matthew Brett
Hi,

On Fri, Jul 1, 2011 at 4:34 PM, Mark Wiebe  wrote:
> On Fri, Jul 1, 2011 at 9:50 AM, Matthew Brett 
> wrote:
>>
>> Hi,
>>
>> On Fri, Jul 1, 2011 at 3:09 PM, Mark Wiebe  wrote:
>> > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett 
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Fri, Jul 1, 2011 at 2:36 AM, Keith Goodman 
>> >> wrote:
>> >> > On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith 
>> >> > wrote:
>> >> >> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett
>> >> >>  wrote:
>> >> >>> In the interest of making the discussion as concrete as possible,
>> >> >>> here
>> >> >>> is my draft of an alternative proposal for NAs and masking, based
>> >> >>> on
>> >> >>> Nathaniel's comments.  Writing it, it seemed to me that Nathaniel
>> >> >>> is
>> >> >>> right, that the ideas become much clearer when the NA idea and the
>> >> >>> MASK idea are separate.   Please do pitch in for things I may have
>> >> >>> missed or misunderstood:
>> >> >> [...]
>> >> >>
>> >> >> Thanks for writing this up! I stuck it up as a gist so we can edit
>> >> >> it
>> >> >> more easily:
>> >> >>  https://gist.github.com/1056379/
>> >> >> This is your initial version:
>> >> >>
>> >> >>
>> >> >>  https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191
>> >> >> And I made a few changes:
>> >> >>
>> >> >>
>> >> >>  https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583
>> >> >> Specifically, I added a rationale section, changed np.MASKED to
>> >> >> np.IGNORE (as per comments in this thread), and added a vowel to
>> >> >> "propmsk".
>> >> >
>> >> > It might be helpful to make a small toy class in python so that
>> >> > people
>> >> > can play around with NA and IGNORE from the alterNEP.
>> >>
>> >> Thanks for doing this.
>> >>
>> >> I don't know about you, but I don't know where to work on the
>> >> discussion or draft implementation, because I am not sure where the
>> >> disagreement is.  Lluis has helpfully pointed out a specific case of
>> >> interest.   Pierre has fed back with some points of clarification.
>> >> However, other than that, I'm not sure what we should be discussing.
>> >>
>> >> @Mark
>> >> @Chuck
>> >> @anyone
>> >>
>> >> Do you see problems with the alterNEP proposal?
>> >
>> > Yes, I really like my design as it stands now, and the alterNEP removes
>> > a
>> > lot of the abstraction and interoperability that are in my opinion the
>> > best
>> > parts. I've made more updates to the NEP based on continuing feedback,
>> > which
>> > are part of the pull request I want reviews for.
>>
>> Ah - I think what you are saying is - too late I've started writing it.
>
> Do you want me to spend my whole summer designing something before starting
> the implementation?

No, but, this is an open source project.  Hence it matters not only
what gets written but how the decisions are made and quality of the
discussion.   Here what I see is that you lost interest in the
discussion some time ago and stopped responding in any specific way.
This unfortunately conveys a lack of interest in our views.   That
might not be true, in which case I'm sure you can convey the opposite
with some substantial discsussion now.  Or it might be for good
reason, heaven knows I've been wrong enough times.  But the community
cost is high for the sake of an extra few days implementation time.
Frankly I think the API will also suffer, but I'm less certain about
that.

> I made a pull request implementing a
> non-controversial part of the NEP to get started, and I've not seen any
> feedback on except from Chuck and Derek. (Many thanks to Chuck and Derek!)
> Implementation and design are tied together in a feedback loop, and separate
> designs that aren't informed by the implementation details, for example
> information gained by going through the proposed code changes and reviewing
> them, are counterproductive. I appreciate the effort you're putting in, and
> I've been trying to guide you towards a more holistic path of contribution
> by pointing out the pull request.

Holistic?

You surely accept that code review is not the mechanism for high-level
API decisions?

>> > Mainly: Reduced interoperability
>>
>> Meaning?
>
> You can't switch between the two approaches without big changes in your
> code.

Lluis provided a case, and it was obscure.  That switch seems like a
rare or non-existent use-case that should not guide the API.

>>
>> > more complex implementation (leading to
>> > more bugs),
>>
>> OK - but the discussion did not seem to be about the complexity of the
>> implementation, but about the API.
>
> The implementation always plays a role in the design of anything. Making an
> API design abstractly, then testing it against implementation constraints is
> good, making an API completely divorced from considerations of
> implementation is really really bad.

Making major API decisions on the basis of implementation ease is also
bad because it leads to a bad API and a bad API leads to confusion,
and makes people use the featu

Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Charles R Harris
On Fri, Jul 1, 2011 at 9:34 AM, Mark Wiebe  wrote:

> On Fri, Jul 1, 2011 at 9:50 AM, Matthew Brett wrote:
>
>> Hi,
>>
>> On Fri, Jul 1, 2011 at 3:09 PM, Mark Wiebe  wrote:
>> > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett 
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Fri, Jul 1, 2011 at 2:36 AM, Keith Goodman 
>> wrote:
>> >> > On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith 
>> wrote:
>> >> >> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett
>> >> >>  wrote:
>> >> >>> In the interest of making the discussion as concrete as possible,
>> here
>> >> >>> is my draft of an alternative proposal for NAs and masking, based
>> on
>> >> >>> Nathaniel's comments.  Writing it, it seemed to me that Nathaniel
>> is
>> >> >>> right, that the ideas become much clearer when the NA idea and the
>> >> >>> MASK idea are separate.   Please do pitch in for things I may have
>> >> >>> missed or misunderstood:
>> >> >> [...]
>> >> >>
>> >> >> Thanks for writing this up! I stuck it up as a gist so we can edit
>> it
>> >> >> more easily:
>> >> >>  https://gist.github.com/1056379/
>> >> >> This is your initial version:
>> >> >>
>> >> >>
>> https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191
>> >> >> And I made a few changes:
>> >> >>
>> >> >>
>> https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583
>> >> >> Specifically, I added a rationale section, changed np.MASKED to
>> >> >> np.IGNORE (as per comments in this thread), and added a vowel to
>> >> >> "propmsk".
>> >> >
>> >> > It might be helpful to make a small toy class in python so that
>> people
>> >> > can play around with NA and IGNORE from the alterNEP.
>> >>
>> >> Thanks for doing this.
>> >>
>> >> I don't know about you, but I don't know where to work on the
>> >> discussion or draft implementation, because I am not sure where the
>> >> disagreement is.  Lluis has helpfully pointed out a specific case of
>> >> interest.   Pierre has fed back with some points of clarification.
>> >> However, other than that, I'm not sure what we should be discussing.
>> >>
>> >> @Mark
>> >> @Chuck
>> >> @anyone
>> >>
>> >> Do you see problems with the alterNEP proposal?
>> >
>> > Yes, I really like my design as it stands now, and the alterNEP removes
>> a
>> > lot of the abstraction and interoperability that are in my opinion the
>> best
>> > parts. I've made more updates to the NEP based on continuing feedback,
>> which
>> > are part of the pull request I want reviews for.
>>
>> Ah - I think what you are saying is - too late I've started writing it.
>>
>
> Do you want me to spend my whole summer designing something before starting
> the implementation? I made a pull request implementing a
> non-controversial part of the NEP to get started, and I've not seen any
> feedback on except from Chuck and Derek. (Many thanks to Chuck and Derek!)
> Implementation and design are tied together in a feedback loop, and separate
> designs that aren't informed by the implementation details, for example
> information gained by going through the proposed code changes and reviewing
> them, are counterproductive. I appreciate the effort you're putting in, and
> I've been trying to guide you towards a more holistic path of contribution
> by pointing out the pull request.
>
> > Mainly: Reduced interoperability
>>
>> Meaning?
>>
>
> You can't switch between the two approaches without big changes in your
> code.
>
>
>>
>> > more complex implementation (leading to
>> > more bugs),
>>
>> OK - but the discussion did not seem to be about the complexity of the
>> implementation, but about the API.
>>
>
> The implementation always plays a role in the design of anything. Making an
> API design abstractly, then testing it against implementation constraints is
> good, making an API completely divorced from considerations of
> implementation is really really bad.
>
>
>>
>> > and an unclear theoretical model for the masked part of i
>>
>> What's unclear?  Or even different?
>>
>
> After thinking about the missing data model some more, I've come up with
> more rationale for why the R approach is good, and adopting both the R
> default and skipna option is appropriate. It's in the pull request up for
> code review.
>
>
>> >> Do you agree that the alterNEP proposal is easier to understand?
>> >
>> >
>> > No.
>>
>> Do you agree that there are several people on the list who do thing
>> that the alterNEP proposal is easier to understand?
>>
>
> Feedback on the clarity of my writing in the NEP is welcome, if something
> is unclear to someone, please point out the specific part so I can continue
> to improve it. I don't think the clarity of the writing is a good reason for
> choosing one design or another, the quality of the design is what should
> decide that.
>
>
>> >> If not, can you explain why?
>> >
>> > My answers to that are already scattered in the emails in various
>> places,
>> > and in the various rationales and justifications provided in the NEP.
>>
>> I can't see any reference t

Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Mark Wiebe
On Fri, Jul 1, 2011 at 9:50 AM, Matthew Brett wrote:

> Hi,
>
> On Fri, Jul 1, 2011 at 3:09 PM, Mark Wiebe  wrote:
> > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett 
> > wrote:
> >>
> >> Hi,
> >>
> >> On Fri, Jul 1, 2011 at 2:36 AM, Keith Goodman 
> wrote:
> >> > On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith 
> wrote:
> >> >> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett
> >> >>  wrote:
> >> >>> In the interest of making the discussion as concrete as possible,
> here
> >> >>> is my draft of an alternative proposal for NAs and masking, based on
> >> >>> Nathaniel's comments.  Writing it, it seemed to me that Nathaniel is
> >> >>> right, that the ideas become much clearer when the NA idea and the
> >> >>> MASK idea are separate.   Please do pitch in for things I may have
> >> >>> missed or misunderstood:
> >> >> [...]
> >> >>
> >> >> Thanks for writing this up! I stuck it up as a gist so we can edit it
> >> >> more easily:
> >> >>  https://gist.github.com/1056379/
> >> >> This is your initial version:
> >> >>
> >> >>
> https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191
> >> >> And I made a few changes:
> >> >>
> >> >>
> https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583
> >> >> Specifically, I added a rationale section, changed np.MASKED to
> >> >> np.IGNORE (as per comments in this thread), and added a vowel to
> >> >> "propmsk".
> >> >
> >> > It might be helpful to make a small toy class in python so that people
> >> > can play around with NA and IGNORE from the alterNEP.
> >>
> >> Thanks for doing this.
> >>
> >> I don't know about you, but I don't know where to work on the
> >> discussion or draft implementation, because I am not sure where the
> >> disagreement is.  Lluis has helpfully pointed out a specific case of
> >> interest.   Pierre has fed back with some points of clarification.
> >> However, other than that, I'm not sure what we should be discussing.
> >>
> >> @Mark
> >> @Chuck
> >> @anyone
> >>
> >> Do you see problems with the alterNEP proposal?
> >
> > Yes, I really like my design as it stands now, and the alterNEP removes a
> > lot of the abstraction and interoperability that are in my opinion the
> best
> > parts. I've made more updates to the NEP based on continuing feedback,
> which
> > are part of the pull request I want reviews for.
>
> Ah - I think what you are saying is - too late I've started writing it.
>

Do you want me to spend my whole summer designing something before starting
the implementation? I made a pull request implementing a
non-controversial part of the NEP to get started, and I've not seen any
feedback on except from Chuck and Derek. (Many thanks to Chuck and Derek!)
Implementation and design are tied together in a feedback loop, and separate
designs that aren't informed by the implementation details, for example
information gained by going through the proposed code changes and reviewing
them, are counterproductive. I appreciate the effort you're putting in, and
I've been trying to guide you towards a more holistic path of contribution
by pointing out the pull request.

> Mainly: Reduced interoperability
>
> Meaning?
>

You can't switch between the two approaches without big changes in your
code.


>
> > more complex implementation (leading to
> > more bugs),
>
> OK - but the discussion did not seem to be about the complexity of the
> implementation, but about the API.
>

The implementation always plays a role in the design of anything. Making an
API design abstractly, then testing it against implementation constraints is
good, making an API completely divorced from considerations of
implementation is really really bad.


>
> > and an unclear theoretical model for the masked part of i
>
> What's unclear?  Or even different?
>

After thinking about the missing data model some more, I've come up with
more rationale for why the R approach is good, and adopting both the R
default and skipna option is appropriate. It's in the pull request up for
code review.


> >> Do you agree that the alterNEP proposal is easier to understand?
> >
> >
> > No.
>
> Do you agree that there are several people on the list who do thing
> that the alterNEP proposal is easier to understand?
>

Feedback on the clarity of my writing in the NEP is welcome, if something is
unclear to someone, please point out the specific part so I can continue to
improve it. I don't think the clarity of the writing is a good reason for
choosing one design or another, the quality of the design is what should
decide that.


> >> If not, can you explain why?
> >
> > My answers to that are already scattered in the emails in various places,
> > and in the various rationales and justifications provided in the NEP.
>
> I can't see any reference to the alterNEP or the idea of the separate
> API in the NEP.  Can you point me to it?
>

I'm referring to positive arguments for why the design decisions are as they
are. I don't see the alterNEP referencing specific things that are

Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Christopher Barker
Matthew Brett wrote:
> should raise an error.  On the other hand, if I make a normal array:
> 
> arr = np.array([1.0, 2.0, 7.0])
> 
> and then do this:
> 
> arr.visible[2] = False
> 
> then either I should raise an error (it's not a masked array), or,
> more magically, construct a mask on the fly. 

maybe it's too much Magic, but it seems reasonable to me that for an 
array without a mask, arr.visible[i] is simply True for all values of i 
-- no need to create a mask to determine that.

does arr[i] = np.IGNORE

auto-create a mask if there is not one there already? I think it should.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Nathaniel Smith
On Fri, Jul 1, 2011 at 7:09 AM, Mark Wiebe  wrote:
> On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett 
> wrote:
>> Do you see problems with the alterNEP proposal?
>
> Yes, I really like my design as it stands now, and the alterNEP removes a
> lot of the abstraction and interoperability that are in my opinion the best
> parts. I've made more updates to the NEP based on continuing feedback, which
> are part of the pull request I want reviews for.
>
>>
>> If so, what are they?
>
> Mainly: Reduced interoperability, more complex implementation (leading to
> more bugs), and an unclear theoretical model for the masked part of it.

Can you give any examples of situations where one would run into this
"reduced interoperability"? I'm not sure what it means. The only
person who has so far spoken up as needing both masking semantics and
NA semantics -- Gary Strangman -- has said that he strongly prefers
the alterNEP semantics *exactly because* it makes it clear *how these
functions will interoperate.*

Can you give any examples of how the implementation would be more
complicated? As far as I can tell there are no elements in the
alterNEP that are not in your NEP, they mostly just expose the
functionality differently at the top level.

Do you have a clearer theoretical model for the masked part of your
proposal? The best I've been able to extract from any of your messages
is when you wrote "it seems to me that people wanting masked arrays
want missing data without touching their data". But as a matter of
English grammar, I have no idea what this means -- if you have data,
it's not missing! It seems to me that people wanting masked data want
to *hide* parts of their data, which seems much clearer to me and is
the theoretical model used in the alterNEP. Note that this model
actually predicts several of the differences between how people want
masks to work and how people want NAs to work (e.g., their behavior
during reduction); I

>> Do you agree that the alterNEP proposal is easier to understand?
>
> No.
>>
>> If not, can you explain why?
>
> My answers to that are already scattered in the emails in various places,
> and in the various rationales and justifications provided in the NEP.

I understand the desire not to get caught up in spending all your time
writing emails explaining things that you feel like you've already
explained.

Maybe there's an email I missed somewhere where you explain the
conceptual model behind your NEP's semantics in a short,
easy-to-understand way (comparable to, say, the Rationale section of
the alterNEP). But I haven't seen it and I can't reconstruct a
rationale for it myself (the alterNEP comes out of my attempts to do
so!).

>> What do you see as the important points of difference between the NEP
>> and the alterNEP?
>
> The biggest thing is the NEP supports more use cases in a clean way by
> composition of different simpler components. It defines one clear missing
> data abstraction, and proposes two implementations that are interchangeable
> and can interoperate.

But the two implementations in your proposal are not interchangeable!
The whole justification for starting with a masked-based
implementation in your proposal is that it supports unmasking via
views; if that requirement were removed, then there would be no reason
to bother with the masking-based implementation at all.

Well, that's not true. There are some marginal advantages in the
special case of working with integers+NAs. But I don't think anyone's
making that argument.

> The alterNEP proposes two independent APIs, reducing
> interoperability and so significantly increasing the amount of learning
> required to work with both of them. This also precludes switching between
> the two approaches without a lot of work.

You can't switch between Python and C without a lot of work too, but
that doesn't mean that they should be merged into one design... but
they do complement each other beautifully. Just like missing data and
masked arrays :-).

> The current pull request that's sitting there waiting for review does not
> have an impact on which approach goes ahead, but the code I'm doing now
> does. This is a fairly large project, and I don't have a great length of
> time to do it in, so I'm not going to participate extensively in the
> alterNEP discussion. If you want to help me, please review my code and
> provide specific feedback on my NEP (the code review system in github is
> great for this too, I've received some excellent feedback on the NEP that
> way). If you want to change my mind about things, please address the
> specific design decisions you think are problematic by specifically
> responding to lines in the NEP, as part of code-reviewing my pull request in
> github.

I know I'm being grumpy in this email, and I apologize for that. But,
no. I've given extensive feedback, read the list carefully, and
thought hard about these issues, and so far you've basically just
dismissed my concerns. (See, e.g., [1], where your respon

Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Matthew Brett
Hi,

On Fri, Jul 1, 2011 at 3:09 PM, Mark Wiebe  wrote:
> On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett 
> wrote:
>>
>> Hi,
>>
>> On Fri, Jul 1, 2011 at 2:36 AM, Keith Goodman  wrote:
>> > On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith  wrote:
>> >> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett
>> >>  wrote:
>> >>> In the interest of making the discussion as concrete as possible, here
>> >>> is my draft of an alternative proposal for NAs and masking, based on
>> >>> Nathaniel's comments.  Writing it, it seemed to me that Nathaniel is
>> >>> right, that the ideas become much clearer when the NA idea and the
>> >>> MASK idea are separate.   Please do pitch in for things I may have
>> >>> missed or misunderstood:
>> >> [...]
>> >>
>> >> Thanks for writing this up! I stuck it up as a gist so we can edit it
>> >> more easily:
>> >>  https://gist.github.com/1056379/
>> >> This is your initial version:
>> >>
>> >>  https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191
>> >> And I made a few changes:
>> >>
>> >>  https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583
>> >> Specifically, I added a rationale section, changed np.MASKED to
>> >> np.IGNORE (as per comments in this thread), and added a vowel to
>> >> "propmsk".
>> >
>> > It might be helpful to make a small toy class in python so that people
>> > can play around with NA and IGNORE from the alterNEP.
>>
>> Thanks for doing this.
>>
>> I don't know about you, but I don't know where to work on the
>> discussion or draft implementation, because I am not sure where the
>> disagreement is.  Lluis has helpfully pointed out a specific case of
>> interest.   Pierre has fed back with some points of clarification.
>> However, other than that, I'm not sure what we should be discussing.
>>
>> @Mark
>> @Chuck
>> @anyone
>>
>> Do you see problems with the alterNEP proposal?
>
> Yes, I really like my design as it stands now, and the alterNEP removes a
> lot of the abstraction and interoperability that are in my opinion the best
> parts. I've made more updates to the NEP based on continuing feedback, which
> are part of the pull request I want reviews for.

Ah - I think what you are saying is - too late I've started writing it.

> Mainly: Reduced interoperability

Meaning?

> more complex implementation (leading to
> more bugs),

OK - but the discussion did not seem to be about the complexity of the
implementation, but about the API.

> and an unclear theoretical model for the masked part of i

What's unclear?  Or even different?

>> Do you agree that the alterNEP proposal is easier to understand?
>
>
> No.

Do you agree that there are several people on the list who do thing
that the alterNEP proposal is easier to understand?

>> If not, can you explain why?
>
> My answers to that are already scattered in the emails in various places,
> and in the various rationales and justifications provided in the NEP.

I can't see any reference to the alterNEP or the idea of the separate
API in the NEP.  Can you point me to it?

>> What do you see as the important points of difference between the NEP
>> and the alterNEP?
>
> The biggest thing is the NEP supports more use cases in a clean way by
> composition of different simpler components. It defines one clear missing
> data abstraction, and proposes two implementations that are interchangeable
> and can interoperate. The alterNEP proposes two independent APIs, reducing
> interoperability and so significantly increasing the amount of learning
> required to work with both of them. This also precludes switching between
> the two approaches without a lot of work.

Lluis gave a particular somewhat obscure case where it is convenient
that the NA and IGNORE are the same.   Are there any others?  It seems
to me the API you propose is a classic example of implicit rather than
explicit, and that it would be very easy, at this stage, to fix that.

> The current pull request that's sitting there waiting for review does not
> have an impact on which approach goes ahead, but the code I'm doing now
> does. This is a fairly large project, and I don't have a great length of
> time to do it in, so I'm not going to participate extensively in the
> alterNEP discussion. If you want to help me, please review my code and
> provide specific feedback on my NEP (the code review system in github is
> great for this too, I've received some excellent feedback on the NEP that
> way). If you want to change my mind about things, please address the
> specific design decisions you think are problematic by specifically
> responding to lines in the NEP, as part of code-reviewing my pull request in
> github.

OK - unless you tell me differently I'l take that as 'the discussion
of the separate API for NA and IGNORE is over as far as I am
concerned'.

I would say, for future reference, that if there is a substantial and
reasonable discussion of the API, that is not well resolved, then it
does harm to go ahead and implement regard

Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Mark Wiebe
On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett wrote:

> Hi,
>
> On Fri, Jul 1, 2011 at 2:36 AM, Keith Goodman  wrote:
> > On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith  wrote:
> >> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett 
> wrote:
> >>> In the interest of making the discussion as concrete as possible, here
> >>> is my draft of an alternative proposal for NAs and masking, based on
> >>> Nathaniel's comments.  Writing it, it seemed to me that Nathaniel is
> >>> right, that the ideas become much clearer when the NA idea and the
> >>> MASK idea are separate.   Please do pitch in for things I may have
> >>> missed or misunderstood:
> >> [...]
> >>
> >> Thanks for writing this up! I stuck it up as a gist so we can edit it
> >> more easily:
> >>  https://gist.github.com/1056379/
> >> This is your initial version:
> >>
> https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191
> >> And I made a few changes:
> >>
> https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583
> >> Specifically, I added a rationale section, changed np.MASKED to
> >> np.IGNORE (as per comments in this thread), and added a vowel to
> >> "propmsk".
> >
> > It might be helpful to make a small toy class in python so that people
> > can play around with NA and IGNORE from the alterNEP.
>
> Thanks for doing this.
>
> I don't know about you, but I don't know where to work on the
> discussion or draft implementation, because I am not sure where the
> disagreement is.  Lluis has helpfully pointed out a specific case of
> interest.   Pierre has fed back with some points of clarification.
> However, other than that, I'm not sure what we should be discussing.
>
> @Mark
> @Chuck
> @anyone
>
> Do you see problems with the alterNEP proposal?


Yes, I really like my design as it stands now, and the alterNEP removes a
lot of the abstraction and interoperability that are in my opinion the best
parts. I've made more updates to the NEP based on continuing feedback, which
are part of the pull request I want reviews for.


> If so, what are they?
>

Mainly: Reduced interoperability, more complex implementation (leading to
more bugs), and an unclear theoretical model for the masked part of it.


> Do you agree that the alterNEP proposal is easier to understand?


No.

If not, can you explain why?
>

My answers to that are already scattered in the emails in various places,
and in the various rationales and justifications provided in the NEP.


> What do you see as the important points of difference between the NEP
> and the alterNEP?
>

The biggest thing is the NEP supports more use cases in a clean way by
composition of different simpler components. It defines one clear missing
data abstraction, and proposes two implementations that are interchangeable
and can interoperate. The alterNEP proposes two independent APIs, reducing
interoperability and so significantly increasing the amount of learning
required to work with both of them. This also precludes switching between
the two approaches without a lot of work.

The current pull request that's sitting there waiting for review does not
have an impact on which approach goes ahead, but the code I'm doing now
does. This is a fairly large project, and I don't have a great length of
time to do it in, so I'm not going to participate extensively in the
alterNEP discussion. If you want to help me, please review my code and
provide specific feedback on my NEP (the code review system in github is
great for this too, I've received some excellent feedback on the NEP that
way). If you want to change my mind about things, please address the
specific design decisions you think are problematic by specifically
responding to lines in the NEP, as part of code-reviewing my pull request in
github.

Thanks,
-Mark

@Pierre - what do you think?
>
> Best,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Matthew Brett
Hi,

On Fri, Jul 1, 2011 at 2:36 AM, Keith Goodman  wrote:
> On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith  wrote:
>> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett  
>> wrote:
>>> In the interest of making the discussion as concrete as possible, here
>>> is my draft of an alternative proposal for NAs and masking, based on
>>> Nathaniel's comments.  Writing it, it seemed to me that Nathaniel is
>>> right, that the ideas become much clearer when the NA idea and the
>>> MASK idea are separate.   Please do pitch in for things I may have
>>> missed or misunderstood:
>> [...]
>>
>> Thanks for writing this up! I stuck it up as a gist so we can edit it
>> more easily:
>>  https://gist.github.com/1056379/
>> This is your initial version:
>>  https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191
>> And I made a few changes:
>>  https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583
>> Specifically, I added a rationale section, changed np.MASKED to
>> np.IGNORE (as per comments in this thread), and added a vowel to
>> "propmsk".
>
> It might be helpful to make a small toy class in python so that people
> can play around with NA and IGNORE from the alterNEP.

Thanks for doing this.

I don't know about you, but I don't know where to work on the
discussion or draft implementation, because I am not sure where the
disagreement is.  Lluis has helpfully pointed out a specific case of
interest.   Pierre has fed back with some points of clarification.
However, other than that, I'm not sure what we should be discussing.

@Mark
@Chuck
@anyone

Do you see problems with the alterNEP proposal?  If so, what are they?
Do you agree that the alterNEP proposal is easier to understand?  If
not, can you explain why?
What do you see as the important points of difference between the NEP
and the alterNEP?

@Pierre - what do you think?

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-06-30 Thread Keith Goodman
On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith  wrote:
> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett  
> wrote:
>> In the interest of making the discussion as concrete as possible, here
>> is my draft of an alternative proposal for NAs and masking, based on
>> Nathaniel's comments.  Writing it, it seemed to me that Nathaniel is
>> right, that the ideas become much clearer when the NA idea and the
>> MASK idea are separate.   Please do pitch in for things I may have
>> missed or misunderstood:
> [...]
>
> Thanks for writing this up! I stuck it up as a gist so we can edit it
> more easily:
>  https://gist.github.com/1056379/
> This is your initial version:
>  https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191
> And I made a few changes:
>  https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583
> Specifically, I added a rationale section, changed np.MASKED to
> np.IGNORE (as per comments in this thread), and added a vowel to
> "propmsk".

It might be helpful to make a small toy class in python so that people
can play around with NA and IGNORE from the alterNEP.

I only had a few minutes, so I only took it this far (1d arrays only):

>> from nary import nary, NA, IGNORE
>> arr = np.array([1,2,3,4,5,6])
>> nar = nary(arr)
>> nar
1., 2., 3., 4., 5., 6.,
>> nar[2] = NA
>> nar
1., 2., NA, 4., 5., 6.,
>> nar[4] = IGNORE
>> nar
1., 2., NA, 4., IGNORE, 6.,
>> nar[4]
IGNORE
>> nar[3]
4
>> nar[2]
NA

The gist is here: https://gist.github.com/1057686

It probably just needs an __add__ and a reducing function such as sum,
but I'm out of time, or so my family tells me.

Implementation? Yes, with masks.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-06-30 Thread Charles R Harris
On Thu, Jun 30, 2011 at 6:02 PM, Matthew Brett wrote:

> Hi,
>
> On Thu, Jun 30, 2011 at 9:01 PM, Lluís  wrote:
> > Matthew Brett writes:
> >
> >> Hi,
> >> On Thu, Jun 30, 2011 at 7:27 PM, Lluís  wrote:
> >>> Matthew Brett writes:
> >>> [...]
>  I'm afraid, like you, I'm a little lost in the world of masking,
>  because I only need the NAs.  I was trying to see if I could come up
>  with an API that picked up some of the syntactic convenience of NAs,
>  without conflating NAs with IGNOREs.   I guess we need some feedback
>  from the 'NA & IGNORE Share the API' (NISA?) proponents to get an idea
>  of what we've missed.  @Mark, @Chuck, guys - what have we lost here by
>  separating the APIs?
> >>>
> >>> As I tried to convey on my other mail, separating both will force you
> to
> >>> either:
> >>>
> >>> * Make a copy of the array before passing it to another routine
> (because
> >>>  the routine will assign np.NA but you still want the original data)
> >
> >> You have an array 'arr'.   The array does support NAs, but it doesn't
> >> have a mask.  You want to pass ``arr`` to another routine ``func``.
> >> You expect ``func`` to set NAs into the data but you don't want
> >> ``func`` to modify ``arr`` and you don't want to copy ``arr`` either.
> >> You are saying the following:
> >
> >> "with the fused API, I can make ``arr`` be a masked array, and pass it
> >> into ``func``, and know that, when func sets elements of arr to NA, it
> >> will only modify the mask and not the underlying data in ``arr``."
> >
> > Yes.
> >
> >
> >> It does seem to me this is a very obscure case.  First, ``func`` is
> >> modifying the array but you want an unmodified array back.  Second,
> >> you'll have to do some view trick to recover the not-NA case to arr,
> >> when it comes back.
> >
> > I know, the example is just silly and convoluted.
> >
> >
> >> It seems to me, that what ``func`` should do, if it wants you to be
> >> able to unmask the NAs, is to make a masked array view of ``arr``, and
> >> return that.   And indeed the simplicity of the separated API
> >> immediately makes that clear - in my view at least.
> >
> > I agree on this example. My only concern is on the API's ability to
> > foresee as most future use-cases as possible, without impacting
> > performance.
>
> But, of course, there's a great danger in trying to cover every
> possible use-case.
>
> My argument is that the kind of cases that you are describe are - I
> believe - very rare and are even a little difficult to make up.  Is
> that fair?
>
> To my mind, the separate NA and IGNORE API is easier to understand and
> explain.   If that isn't true, please do say, and say why - because
> that point is key.
>
>
I think the main problem is that they aren't separate, one takes place in a
view of an unmasked array, the other starts with a masked array. These
aren't 'different' in mechanism, they are just different in work flow. And I
think they fit in well with the view idea.


> If it is true that the separate API is clearer, then the benefit in
> terms of power and extensibility has to be large, in order to go for
> the fused API.
>
>
Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-06-30 Thread Gary Strangman



It seems to me, that what ``func`` should do, if it wants you to be
able to unmask the NAs, is to make a masked array view of ``arr``, and
return that.   And indeed the simplicity of the separated API
immediately makes that clear - in my view at least.


I agree on this example. My only concern is on the API's ability to
foresee as most future use-cases as possible, without impacting
performance.


But, of course, there's a great danger in trying to cover every
possible use-case.

My argument is that the kind of cases that you are describe are - I
believe - very rare and are even a little difficult to make up.  Is
that fair?

To my mind, the separate NA and IGNORE API is easier to understand and
explain.   If that isn't true, please do say, and say why - because
that point is key.

If it is true that the separate API is clearer, then the benefit in
terms of power and extensibility has to be large, in order to go for
the fused API.


For what it's worth, I wholeheartedly agree with Matthew here. Being able 
to designate NA separately from IGNORE has tremendous conceptual clarity, 
at least for me. Not only are these are completely separate mental 
constructs in my head, but they even arise from completely different 
sources: NAs arise from my subjects whims, my experimental procedures, my 
research personnel, or bad equipment days, whereas IGNORE generally comes 
from me and my analysis or visualization needs.


While I bet it's possible for an exceedingly clever person to fuse the two 
(I doubt my brain could pull that off), I fear that in the end I would 
have to go to the documentation every time in order to use either one. 
Thus, I agree that fusing into a single API needs to have a very large 
benefit. I admit I haven't followed all steps here, but I sense there is 
indeed numpy-coder-level benefit to fusing. However I, like Matthew (I 
believe), don't see appreciable benefits at the user level, /plus/ the 
risk of user confusion ...


-best
Gary


The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-06-30 Thread Matthew Brett
Hi,

On Thu, Jun 30, 2011 at 9:01 PM, Lluís  wrote:
> Matthew Brett writes:
>
>> Hi,
>> On Thu, Jun 30, 2011 at 7:27 PM, Lluís  wrote:
>>> Matthew Brett writes:
>>> [...]
 I'm afraid, like you, I'm a little lost in the world of masking,
 because I only need the NAs.  I was trying to see if I could come up
 with an API that picked up some of the syntactic convenience of NAs,
 without conflating NAs with IGNOREs.   I guess we need some feedback
 from the 'NA & IGNORE Share the API' (NISA?) proponents to get an idea
 of what we've missed.  @Mark, @Chuck, guys - what have we lost here by
 separating the APIs?
>>>
>>> As I tried to convey on my other mail, separating both will force you to
>>> either:
>>>
>>> * Make a copy of the array before passing it to another routine (because
>>>  the routine will assign np.NA but you still want the original data)
>
>> You have an array 'arr'.   The array does support NAs, but it doesn't
>> have a mask.  You want to pass ``arr`` to another routine ``func``.
>> You expect ``func`` to set NAs into the data but you don't want
>> ``func`` to modify ``arr`` and you don't want to copy ``arr`` either.
>> You are saying the following:
>
>> "with the fused API, I can make ``arr`` be a masked array, and pass it
>> into ``func``, and know that, when func sets elements of arr to NA, it
>> will only modify the mask and not the underlying data in ``arr``."
>
> Yes.
>
>
>> It does seem to me this is a very obscure case.  First, ``func`` is
>> modifying the array but you want an unmodified array back.  Second,
>> you'll have to do some view trick to recover the not-NA case to arr,
>> when it comes back.
>
> I know, the example is just silly and convoluted.
>
>
>> It seems to me, that what ``func`` should do, if it wants you to be
>> able to unmask the NAs, is to make a masked array view of ``arr``, and
>> return that.   And indeed the simplicity of the separated API
>> immediately makes that clear - in my view at least.
>
> I agree on this example. My only concern is on the API's ability to
> foresee as most future use-cases as possible, without impacting
> performance.

But, of course, there's a great danger in trying to cover every
possible use-case.

My argument is that the kind of cases that you are describe are - I
believe - very rare and are even a little difficult to make up.  Is
that fair?

To my mind, the separate NA and IGNORE API is easier to understand and
explain.   If that isn't true, please do say, and say why - because
that point is key.

If it is true that the separate API is clearer, then the benefit in
terms of power and extensibility has to be large, in order to go for
the fused API.

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-06-30 Thread Lluís
Matthew Brett writes:

> Hi,
> On Thu, Jun 30, 2011 at 7:27 PM, Lluís  wrote:
>> Matthew Brett writes:
>> [...]
>>> I'm afraid, like you, I'm a little lost in the world of masking,
>>> because I only need the NAs.  I was trying to see if I could come up
>>> with an API that picked up some of the syntactic convenience of NAs,
>>> without conflating NAs with IGNOREs.   I guess we need some feedback
>>> from the 'NA & IGNORE Share the API' (NISA?) proponents to get an idea
>>> of what we've missed.  @Mark, @Chuck, guys - what have we lost here by
>>> separating the APIs?
>> 
>> As I tried to convey on my other mail, separating both will force you to
>> either:
>> 
>> * Make a copy of the array before passing it to another routine (because
>>  the routine will assign np.NA but you still want the original data)

> You have an array 'arr'.   The array does support NAs, but it doesn't
> have a mask.  You want to pass ``arr`` to another routine ``func``.
> You expect ``func`` to set NAs into the data but you don't want
> ``func`` to modify ``arr`` and you don't want to copy ``arr`` either.
> You are saying the following:

> "with the fused API, I can make ``arr`` be a masked array, and pass it
> into ``func``, and know that, when func sets elements of arr to NA, it
> will only modify the mask and not the underlying data in ``arr``."

Yes.


> It does seem to me this is a very obscure case.  First, ``func`` is
> modifying the array but you want an unmodified array back.  Second,
> you'll have to do some view trick to recover the not-NA case to arr,
> when it comes back.

I know, the example is just silly and convoluted.


> It seems to me, that what ``func`` should do, if it wants you to be
> able to unmask the NAs, is to make a masked array view of ``arr``, and
> return that.   And indeed the simplicity of the separated API
> immediately makes that clear - in my view at least.

I agree on this example. My only concern is on the API's ability to
foresee as most future use-cases as possible, without impacting
performance.

1) On one hand, we have that functions must be specially crafted to
   handle transient NA (i.e., create a masked array to store the output,
   which will be possibly optional, so it needs another function
   argument). And not everybody will foresee such usage, resulting in an
   inconsistent API w.r.t. np.NA vs np.IGNORE. We could alternatively
   see this as a knob to say, whenever you store np.NA, please use
   np.IGNORE. It all needs collaboration from the callee.

2) On the other hand, we have that it can all be controlled by the
   caller, who is really the only one that knows its needs. This, at the
   risk of confusing the user (I still believe the user should not be
   confused because the mask must be explicitly activated).

If you're telling me "2 is not necessary because functions written as 1
are few and clearly identified", then I'll just say I don't know.


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-06-30 Thread Lluís
Nathaniel Smith writes:

> On Thu, Jun 30, 2011 at 11:27 AM, Lluís  wrote:
>> As I tried to convey on my other mail, separating both will force you to
>> either:
>> 
>> * Make a copy of the array before passing it to another routine (because
>>  the routine will assign np.NA but you still want the original data)

> To help me understand, do you have an example in mind of a routine
> that would do that? I can't think of any cases where I had some
> original data that some routine wanted to throw out and replace with
> NAs; it just seems... weird. Maybe I'm missing something though...

Well, I had some silly example on another thread. A function that
computes the mean of all non-NA values, and assigns NA to all cells that
are beyond certain threshold of that mean value.


> (I can imagine that it would make sense for what we're calling a
> masked array, where you have some routine which computes which values
> should be ignored for a particular purpose. But if it only makes sense
> for masked arrays then you can just write your routine to work with
> masked arrays only, and it doesn't matter how similar the masking and
> missing APIs are.)

The routine makes sense by itself as a beyond-mean detector.

The routine must not care whether your NAs are transient or not (in your
aNEP, whether you want it to assign np.NA or np.IGNORE, which must be
indicated by the caller through yet another function argument).

Note that callers will not only have to indicate which "type" of missing
data the calle should use (np.NA or np.IGNORE), but they also have to
indicate whether np.NAs must be ignored (i.e., skipna=bool), as well as
np.IGNORE (i.e., propmask=bool).

Of course it is doable, but adding 3 more arguments to *all* functions
(including ufuncs, and higher-level functions) does not seem as
desirable to me.


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-06-30 Thread Matthew Brett
Hi,

On Thu, Jun 30, 2011 at 7:27 PM, Lluís  wrote:
> Matthew Brett writes:
> [...]
>> I'm afraid, like you, I'm a little lost in the world of masking,
>> because I only need the NAs.  I was trying to see if I could come up
>> with an API that picked up some of the syntactic convenience of NAs,
>> without conflating NAs with IGNOREs.   I guess we need some feedback
>> from the 'NA & IGNORE Share the API' (NISA?) proponents to get an idea
>> of what we've missed.  @Mark, @Chuck, guys - what have we lost here by
>> separating the APIs?
>
> As I tried to convey on my other mail, separating both will force you to
> either:
>
> * Make a copy of the array before passing it to another routine (because
>  the routine will assign np.NA but you still want the original data)

You have an array 'arr'.   The array does support NAs, but it doesn't
have a mask.  You want to pass ``arr`` to another routine ``func``.
You expect ``func`` to set NAs into the data but you don't want
``func`` to modify ``arr`` and you don't want to copy ``arr`` either.
You are saying the following:

"with the fused API, I can make ``arr`` be a masked array, and pass it
into ``func``, and know that, when func sets elements of arr to NA, it
will only modify the mask and not the underlying data in ``arr``."

It does seem to me this is a very obscure case.  First, ``func`` is
modifying the array but you want an unmodified array back.  Second,
you'll have to do some view trick to recover the not-NA case to arr,
when it comes back.

It seems to me, that what ``func`` should do, if it wants you to be
able to unmask the NAs, is to make a masked array view of ``arr``, and
return that.   And indeed the simplicity of the separated API
immediately makes that clear - in my view at least.

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-06-30 Thread Nathaniel Smith
On Thu, Jun 30, 2011 at 11:27 AM, Lluís  wrote:
> As I tried to convey on my other mail, separating both will force you to
> either:
>
> * Make a copy of the array before passing it to another routine (because
>  the routine will assign np.NA but you still want the original data)

To help me understand, do you have an example in mind of a routine
that would do that? I can't think of any cases where I had some
original data that some routine wanted to throw out and replace with
NAs; it just seems... weird. Maybe I'm missing something though...

(I can imagine that it would make sense for what we're calling a
masked array, where you have some routine which computes which values
should be ignored for a particular purpose. But if it only makes sense
for masked arrays then you can just write your routine to work with
masked arrays only, and it doesn't matter how similar the masking and
missing APIs are.)

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-06-30 Thread Lluís
Matthew Brett writes:
[...]
> I'm afraid, like you, I'm a little lost in the world of masking,
> because I only need the NAs.  I was trying to see if I could come up
> with an API that picked up some of the syntactic convenience of NAs,
> without conflating NAs with IGNOREs.   I guess we need some feedback
> from the 'NA & IGNORE Share the API' (NISA?) proponents to get an idea
> of what we've missed.  @Mark, @Chuck, guys - what have we lost here by
> separating the APIs?

As I tried to convey on my other mail, separating both will force you to
either:

* Make a copy of the array before passing it to another routine (because
  the routine will assign np.NA but you still want the original data)

or

* Tell the other routine whether it should use np.NA or np.IGNORE
  *and* whether it should use "skipna" and/or "propmask".


To me, that's the whole point about a unified API:

* Avoid making array copies.

* Do not add more arguments to *all* routines (to tell them which kind
  of missing data they should produce, and which kind of missing data
  they should ignore/propagate).


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-06-30 Thread Matthew Brett
Hi,

On Thu, Jun 30, 2011 at 6:51 PM, Nathaniel Smith  wrote:
> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett  
> wrote:
>> In the interest of making the discussion as concrete as possible, here
>> is my draft of an alternative proposal for NAs and masking, based on
>> Nathaniel's comments.  Writing it, it seemed to me that Nathaniel is
>> right, that the ideas become much clearer when the NA idea and the
>> MASK idea are separate.   Please do pitch in for things I may have
>> missed or misunderstood:
> [...]
>
> Thanks for writing this up! I stuck it up as a gist so we can edit it
> more easily:
>  https://gist.github.com/1056379/
> This is your initial version:
>  https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191
> And I made a few changes:
>  https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583
> Specifically, I added a rationale section, changed np.MASKED to
> np.IGNORE (as per comments in this thread), and added a vowel to
> "propmsk".

Thanks for doing that.

> One thing I wonder about the design is whether having an
> np.MASKED/np.IGNORE value at all helps or hurts. (Occam tells us never
> to multiply entities without necessity! And it's a bit of an odd fit
> to the masking concept, since the whole idea is that masking is a
> property of the array, not the individual datums.)
>
> Currently, I see the following uses for it:
>  -- As a return value when someone tries to scalar-index a masked value
>  -- As a placeholder to specify masked values when creating an array
> from a list (but not when assigning to an array later)
>  -- As a return value when using propmask=True
>  -- As something to display when printing a masked array
>
> Another way of doing things would be:
>  -- Scalar-indexing a masked value returns an error, like trying to
> index past the end of an array. (Slicing etc. would still return a new
> masked array.)
>  -- Having some sort of placeholder does seem nice, but I'm not sure
> how often you need to type out a masked array. And I notice that
> numpy.ma does support this (like so: ma.array([1, ma.masked, 3])) but
> the examples in the docs never use it. The replacement idiom would be
> something like: my_data = np.array([1, 999, 3], masked=True);
> my_data.visible = (my_data != 999). So maybe just leave out the
> placeholder value, at least for version 1?
>  -- I don't really see the logic for supporting 'propmask' at all.
> AFAICT no-one has ever even considered this as a useful feature for
> numpy.ma, never mind implemented it?
>  -- When printing, the numpy.ma approach of using "--" seems much
> more readable than me than having "IGNORE" all over my screen.
>
> So overall, making these changes would let us simplify the design. But
> maybe propmask is really critical for some use case, or there's some
> good reason to want to scalar-index missing values without getting an
> error?

I'm afraid, like you, I'm a little lost in the world of masking,
because I only need the NAs.  I was trying to see if I could come up
with an API that picked up some of the syntactic convenience of NAs,
without conflating NAs with IGNOREs.   I guess we need some feedback
from the 'NA & IGNORE Share the API' (NISA?) proponents to get an idea
of what we've missed.  @Mark, @Chuck, guys - what have we lost here by
separating the APIs?

See you,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-06-30 Thread Nathaniel Smith
On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett  wrote:
> In the interest of making the discussion as concrete as possible, here
> is my draft of an alternative proposal for NAs and masking, based on
> Nathaniel's comments.  Writing it, it seemed to me that Nathaniel is
> right, that the ideas become much clearer when the NA idea and the
> MASK idea are separate.   Please do pitch in for things I may have
> missed or misunderstood:
[...]

Thanks for writing this up! I stuck it up as a gist so we can edit it
more easily:
  https://gist.github.com/1056379/
This is your initial version:
  https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191
And I made a few changes:
  https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583
Specifically, I added a rationale section, changed np.MASKED to
np.IGNORE (as per comments in this thread), and added a vowel to
"propmsk".

One thing I wonder about the design is whether having an
np.MASKED/np.IGNORE value at all helps or hurts. (Occam tells us never
to multiply entities without necessity! And it's a bit of an odd fit
to the masking concept, since the whole idea is that masking is a
property of the array, not the individual datums.)

Currently, I see the following uses for it:
  -- As a return value when someone tries to scalar-index a masked value
  -- As a placeholder to specify masked values when creating an array
from a list (but not when assigning to an array later)
  -- As a return value when using propmask=True
  -- As something to display when printing a masked array

Another way of doing things would be:
  -- Scalar-indexing a masked value returns an error, like trying to
index past the end of an array. (Slicing etc. would still return a new
masked array.)
  -- Having some sort of placeholder does seem nice, but I'm not sure
how often you need to type out a masked array. And I notice that
numpy.ma does support this (like so: ma.array([1, ma.masked, 3])) but
the examples in the docs never use it. The replacement idiom would be
something like: my_data = np.array([1, 999, 3], masked=True);
my_data.visible = (my_data != 999). So maybe just leave out the
placeholder value, at least for version 1?
  -- I don't really see the logic for supporting 'propmask' at all.
AFAICT no-one has ever even considered this as a useful feature for
numpy.ma, never mind implemented it?
  -- When printing, the numpy.ma approach of using "--" seems much
more readable than me than having "IGNORE" all over my screen.

So overall, making these changes would let us simplify the design. But
maybe propmask is really critical for some use case, or there's some
good reason to want to scalar-index missing values without getting an
error?

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-06-30 Thread Matthew Brett
Hi,

On Thu, Jun 30, 2011 at 5:03 PM, Pierre GM  wrote:
>
> On Jun 30, 2011, at 5:38 PM, Matthew Brett wrote:
>
>> Hi,
>>
>> On Thu, Jun 30, 2011 at 2:58 PM, Pierre GM  wrote:
>>>
>>> On Jun 30, 2011, at 3:31 PM, Matthew Brett wrote:
 ###
 A alternative-NEP on masking and missing values
 ###
>>>
>>> I like the idea of two different special values, np.NA for missing values, 
>>> np.IGNORE for masked values. np.NA values in an array define what was 
>>> implemented in numpy.ma as a 'hard mask' (where you can't unmask data), 
>>> while np.IGNOREs correspond to the .mask in numpy.ma. Looks fairly non 
>>> ambiguous that way.
>>>
>>>
 **
 Initialization
 **

 First, missing values can be set and be displayed as ``np.NA, NA``::

>>> np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]')
    array([1., 2., NA, 7.], dtype='NA[>>>
 As the initialization is not ambiguous, this can be written without the NA
 dtype::

>>> np.array([1.0, 2.0, np.NA, 7.0])
    array([1., 2., NA, 7.], dtype='NA[>>>
 Masked values can be set and be displayed as ``np.MASKED, MASKED``::

>>> np.array([1.0, 2.0, np.MASKED, 7.0], masked=True)
    array([1., 2., MASKED, 7.], masked=True)

 As the initialization is not ambiguous, this can be written without
 ``masked=True``::

>>> np.array([1.0, 2.0, np.MASKED, 7.0])
    array([1., 2., MASKED, 7.], masked=True)
>>>
>>> I'm not happy with this 'masked' parameter, at all. What's the point? 
>>> Either you have np.NAs and/or np.IGNOREs or you don't. I'm probably missing 
>>> something here.
>>
>> If I put np.MASKED (I agree I prefer np.IGNORE) in the init, then
>> obviously I mean it should be masked, so the 'masked=True' here is
>> completely redundant, yes, I agree.  And in fact:
>>
>> np.array([1.0, 2.0, np.MASKED, 7.0], masked=False)
>>
>> should raise an error.  On the other hand, if I make a normal array:
>>
>> arr = np.array([1.0, 2.0, 7.0])
>>
>> and then do this:
>>
>> arr.visible[2] = False
>>
>> then either I should raise an error (it's not a masked array), or,
>> more magically, construct a mask on the fly.   This somewhat breaks
>> expectations though, because you might just have made a largish mask
>> array without having any clue that that had happened.
>
> Well, I'd expect an error to be raised when assigning a NA if the initial 
> array is not NA friendly. The 'magical' creation of a mask would be nice, but 
> is probably too magic and best left alone.

I agree :)

>>>

 Direct assignnent in the masked case is magic and confusing, and so 
 happens only
 via the mask::

>>> masked_array = np.array([1.0, 2.0, 7.0], masked=True)
>>> masked_arr[2] = np.NA
    TypeError('dtype does not support NA')
>>> masked_arr[2] = np.MASKED
    TypeError('float() argument must be a string or a number')
>>> masked_arr.visible[2] = False
>>> masked_arr
    array([1., 2., MASKED], masked=True)
>>>
>>> What about the reverse case ? When you assign a regular value to a 
>>> np.NA/np.IGNORE item ?
>>
>> Well, for the np.NA case, this is straightforward:
>>
>> na_arr[2] = 3
>>
>> It's just assignment. For ``masked_array[2] = 3`` - I don't know, I
>> guess whatever we are used to.  What do you think?
>
> Ahah, that depends.
> With a = np.array([1., np.NA, 3.]), then a[1]=2. should raise an error, as 
> Mark suggests: you can't "unmask" a missing value, you need to create a view 
> of the initial array then "unmask". It's the equivalent of a hard mask.

In this alterNEP, the NAs and the masked values are completely
different.  So, if you do this:

a = np.array([1., np.NA, 3.])

then you've unambiguously asked for an array that can handle floats
and NAs, and that will be the NA[ With a = np.array([1., np.IGNORE, 3.]), then a[1]=2. should give 
> np.array([1.,2.,3.]) and a.mask=[False,False,False]. That's a soft mask.

Sounds reasonable to me...

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-06-30 Thread Pierre GM

On Jun 30, 2011, at 5:38 PM, Matthew Brett wrote:

> Hi,
> 
> On Thu, Jun 30, 2011 at 2:58 PM, Pierre GM  wrote:
>> 
>> On Jun 30, 2011, at 3:31 PM, Matthew Brett wrote:
>>> ###
>>> A alternative-NEP on masking and missing values
>>> ###
>> 
>> I like the idea of two different special values, np.NA for missing values, 
>> np.IGNORE for masked values. np.NA values in an array define what was 
>> implemented in numpy.ma as a 'hard mask' (where you can't unmask data), 
>> while np.IGNOREs correspond to the .mask in numpy.ma. Looks fairly non 
>> ambiguous that way.
>> 
>> 
>>> **
>>> Initialization
>>> **
>>> 
>>> First, missing values can be set and be displayed as ``np.NA, NA``::
>>> 
>> np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]')
>>>array([1., 2., NA, 7.], dtype='NA[>> 
>>> As the initialization is not ambiguous, this can be written without the NA
>>> dtype::
>>> 
>> np.array([1.0, 2.0, np.NA, 7.0])
>>>array([1., 2., NA, 7.], dtype='NA[>> 
>>> Masked values can be set and be displayed as ``np.MASKED, MASKED``::
>>> 
>> np.array([1.0, 2.0, np.MASKED, 7.0], masked=True)
>>>array([1., 2., MASKED, 7.], masked=True)
>>> 
>>> As the initialization is not ambiguous, this can be written without
>>> ``masked=True``::
>>> 
>> np.array([1.0, 2.0, np.MASKED, 7.0])
>>>array([1., 2., MASKED, 7.], masked=True)
>> 
>> I'm not happy with this 'masked' parameter, at all. What's the point? Either 
>> you have np.NAs and/or np.IGNOREs or you don't. I'm probably missing 
>> something here.
> 
> If I put np.MASKED (I agree I prefer np.IGNORE) in the init, then
> obviously I mean it should be masked, so the 'masked=True' here is
> completely redundant, yes, I agree.  And in fact:
> 
> np.array([1.0, 2.0, np.MASKED, 7.0], masked=False)
> 
> should raise an error.  On the other hand, if I make a normal array:
> 
> arr = np.array([1.0, 2.0, 7.0])
> 
> and then do this:
> 
> arr.visible[2] = False
> 
> then either I should raise an error (it's not a masked array), or,
> more magically, construct a mask on the fly.   This somewhat breaks
> expectations though, because you might just have made a largish mask
> array without having any clue that that had happened.

Well, I'd expect an error to be raised when assigning a NA if the initial array 
is not NA friendly. The 'magical' creation of a mask would be nice, but is 
probably too magic and best left alone.


>> 
>>> 
>>> Direct assignnent in the masked case is magic and confusing, and so happens 
>>> only
>>> via the mask::
>>> 
>> masked_array = np.array([1.0, 2.0, 7.0], masked=True)
>> masked_arr[2] = np.NA
>>>TypeError('dtype does not support NA')
>> masked_arr[2] = np.MASKED
>>>TypeError('float() argument must be a string or a number')
>> masked_arr.visible[2] = False
>> masked_arr
>>>array([1., 2., MASKED], masked=True)
>> 
>> What about the reverse case ? When you assign a regular value to a 
>> np.NA/np.IGNORE item ?
> 
> Well, for the np.NA case, this is straightforward:
> 
> na_arr[2] = 3
> 
> It's just assignment. For ``masked_array[2] = 3`` - I don't know, I
> guess whatever we are used to.  What do you think?

Ahah, that depends.
With a = np.array([1., np.NA, 3.]), then a[1]=2. should raise an error, as Mark 
suggests: you can't "unmask" a missing value, you need to create a view of the 
initial array then "unmask". It's the equivalent of a hard mask.
With a = np.array([1., np.IGNORE, 3.]), then a[1]=2. should give 
np.array([1.,2.,3.]) and a.mask=[False,False,False]. That's a soft mask.
At least, that's how I see it...
P.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-06-30 Thread Matthew Brett
Hi,

On Thu, Jun 30, 2011 at 2:58 PM, Pierre GM  wrote:
>
> On Jun 30, 2011, at 3:31 PM, Matthew Brett wrote:
>> ###
>> A alternative-NEP on masking and missing values
>> ###
>
> I like the idea of two different special values, np.NA for missing values, 
> np.IGNORE for masked values. np.NA values in an array define what was 
> implemented in numpy.ma as a 'hard mask' (where you can't unmask data), while 
> np.IGNOREs correspond to the .mask in numpy.ma. Looks fairly non ambiguous 
> that way.
>
>
>> **
>> Initialization
>> **
>>
>> First, missing values can be set and be displayed as ``np.NA, NA``::
>>
> np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]')
>>    array([1., 2., NA, 7.], dtype='NA[>
>> As the initialization is not ambiguous, this can be written without the NA
>> dtype::
>>
> np.array([1.0, 2.0, np.NA, 7.0])
>>    array([1., 2., NA, 7.], dtype='NA[>
>> Masked values can be set and be displayed as ``np.MASKED, MASKED``::
>>
> np.array([1.0, 2.0, np.MASKED, 7.0], masked=True)
>>    array([1., 2., MASKED, 7.], masked=True)
>>
>> As the initialization is not ambiguous, this can be written without
>> ``masked=True``::
>>
> np.array([1.0, 2.0, np.MASKED, 7.0])
>>    array([1., 2., MASKED, 7.], masked=True)
>
> I'm not happy with this 'masked' parameter, at all. What's the point? Either 
> you have np.NAs and/or np.IGNOREs or you don't. I'm probably missing 
> something here.

If I put np.MASKED (I agree I prefer np.IGNORE) in the init, then
obviously I mean it should be masked, so the 'masked=True' here is
completely redundant, yes, I agree.  And in fact:

np.array([1.0, 2.0, np.MASKED, 7.0], masked=False)

should raise an error.  On the other hand, if I make a normal array:

arr = np.array([1.0, 2.0, 7.0])

and then do this:

arr.visible[2] = False

then either I should raise an error (it's not a masked array), or,
more magically, construct a mask on the fly.   This somewhat breaks
expectations though, because you might just have made a largish mask
array without having any clue that that had happened.

>
>> **
>> Ufuncs
>> **
>
> All fine.
>>
>> **
>> Assignment
>> **
>>
>> is obvious in the NA case::
>>
> arr = np.array([1.0, 2.0, 7.0])
> arr[2] = np.NA
>>    TypeError('dtype does not support NA')
> na_arr = np.array([1.0, 2.0, 7.0], dtype='NA[f8]')
> na_arr[2] = np.NA
> na_arr
>>    array([1., 2., NA], dtype='NA[
> OK
>
>
>>
>> Direct assignnent in the masked case is magic and confusing, and so happens 
>> only
>> via the mask::
>>
> masked_array = np.array([1.0, 2.0, 7.0], masked=True)
> masked_arr[2] = np.NA
>>    TypeError('dtype does not support NA')
> masked_arr[2] = np.MASKED
>>    TypeError('float() argument must be a string or a number')
> masked_arr.visible[2] = False
> masked_arr
>>    array([1., 2., MASKED], masked=True)
>
> What about the reverse case ? When you assign a regular value to a 
> np.NA/np.IGNORE item ?

Well, for the np.NA case, this is straightforward:

na_arr[2] = 3

It's just assignment. For ``masked_array[2] = 3`` - I don't know, I
guess whatever we are used to.  What do you think?

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-06-30 Thread Charles R Harris
On Thu, Jun 30, 2011 at 8:17 AM, Charles R Harris  wrote:

>
>
> On Thu, Jun 30, 2011 at 7:31 AM, Matthew Brett wrote:
>
>> Hi,
>>
>> On Tue, Jun 28, 2011 at 4:06 PM, Nathaniel Smith  wrote:
>> > Anyway, it's pretty clear that in this particular case, there are two
>> > distinct features that different people want: the missing data
>> > feature, and the masked array feature. The more I think about it, the
>> > less I see how they can be combined into one dessert topping + floor
>> > wax solution. Here are three particular points where they seem to
>> > contradict each other:
>> ...
>> [some proposals]
>>
>> In the interest of making the discussion as concrete as possible, here
>> is my draft of an alternative proposal for NAs and masking, based on
>> Nathaniel's comments.  Writing it, it seemed to me that Nathaniel is
>> right, that the ideas become much clearer when the NA idea and the
>> MASK idea are separate.   Please do pitch in for things I may have
>> missed or misunderstood:
>>
>> ###
>> A alternative-NEP on masking and missing values
>> ###
>>
>> The principle of this aNEP is to separate the APIs for masking and for
>> missing
>> values, according to
>>
>> * The current implementation of masked arrays
>> * Nathaniel Smith's proposal.
>>
>> This discussion is only of the API, and not of the implementation.
>>
>> **
>> Initialization
>> **
>>
>> First, missing values can be set and be displayed as ``np.NA, NA``::
>>
>>>>> np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]')
>>array([1., 2., NA, 7.], dtype='NA[>
>> As the initialization is not ambiguous, this can be written without the NA
>> dtype::
>>
>>>>> np.array([1.0, 2.0, np.NA, 7.0])
>>array([1., 2., NA, 7.], dtype='NA[>
>> Masked values can be set and be displayed as ``np.MASKED, MASKED``::
>>
>>>>> np.array([1.0, 2.0, np.MASKED, 7.0], masked=True)
>>array([1., 2., MASKED, 7.], masked=True)
>>
>> As the initialization is not ambiguous, this can be written without
>> ``masked=True``::
>>
>>>>> np.array([1.0, 2.0, np.MASKED, 7.0])
>>array([1., 2., MASKED, 7.], masked=True)
>>
>> **
>> Ufuncs
>> **
>>
>> By default, NA values propagate::
>>
>>>>> na_arr = np.array([1.0, 2.0, np.NA, 7.0])
>>>>> np.sum(na_arr)
>>NA('float64')
>>
>> unless the ``skipna`` flag is set::
>>
>>>>> np.sum(na_arr, skipna=True)
>>10.0
>>
>> By default, masking does not propagate::
>>
>>>>> masked_arr = np.array([1.0, 2.0, np.MASKED, 7.0])
>>>>> np.sum(masked_arr)
>>10.0
>>
>> unless the ``propmsk`` flag is set::
>>
>>>>> np.sum(masked_arr, propmsk=True)
>>MASKED
>>
>> An array can be masked, and contain NA values::
>>
>>>>> both_arr = np.array([1.0, 2.0, np.MASKED, np.NA, 7.0])
>>
>> In the default case, the behavior is obvious::
>>
>>>>> np.sum(both_arr)
>>NA('float64')
>>
>> It's also obvious what to do with ``skipna=True``::
>>
>>>>> np.sum(both_arr, skipna=True)
>>10.0
>>>>> np.sum(both_arr, skipna=True, propmsk=True)
>>MASKED
>>
>> To break the tie between NA and MSK, NAs propagate harder::
>>
>>>>> np.sum(both_arr, propmsk=True)
>>NA('float64')
>>
>> **
>> Assignment
>> **
>>
>> is obvious in the NA case::
>>
>>>>> arr = np.array([1.0, 2.0, 7.0])
>>>>> arr[2] = np.NA
>>TypeError('dtype does not support NA')
>>>>> na_arr = np.array([1.0, 2.0, 7.0], dtype='NA[f8]')
>>>>> na_arr[2] = np.NA
>>>>> na_arr
>>array([1., 2., NA], dtype='NA[>
>> Direct assignnent in the masked case is magic and confusing, and so
>> happens only
>> via the mask::
>>
>>>>> masked_array = np.array([1.0, 2.0, 7.0], masked=True)
>>>>> masked_arr[2] = np.NA
>>TypeError('dtype does not support NA')
>>>>> masked_arr[2] = np.MASKED
>>TypeError('float() argument must be a string or a number')
>>>>> masked_arr.visible[2] = False
>>>>> masked_arr
>>array([1., 2., MASKED], masked=True)
>>
>> See y'all,
>>
>>
> I honestly don't see the problem here. The difference isn't between
> masked_values/missing_values, it is between masked arrays and masked views
> of unmasked arrays. I think the view concept is central to what is going on.
> It may not be what folks are used to, but it strikes me as a clarifying
> advance rather than a mixed up confusion. Admittedly, it depends on the
> numpy centric ability to have views, but views are a wonderful thing.
>
>
OK, I can see a problem in that currently the only way to unmask a value is
by assignment of a valid value to the underlying data array, that is the
missing data idea. For masked data, it might be convenient to have something
that only affected the mask instead of having to take another view of the
unmasked data and reconstructing the mask with some modifications. So that
could maybe be done with a "soft" np.CLEAR that only worked on views of
unmasked arrays.

Chuck

Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-06-30 Thread Dag Sverre Seljebotn
On 06/30/2011 04:17 PM, Charles R Harris wrote:
>
>
> On Thu, Jun 30, 2011 at 7:31 AM, Matthew Brett  > wrote:
>
> Hi,
>
> On Tue, Jun 28, 2011 at 4:06 PM, Nathaniel Smith  > wrote:
>  > Anyway, it's pretty clear that in this particular case, there are two
>  > distinct features that different people want: the missing data
>  > feature, and the masked array feature. The more I think about it, the
>  > less I see how they can be combined into one dessert topping + floor
>  > wax solution. Here are three particular points where they seem to
>  > contradict each other:
> ...
> [some proposals]
>
> In the interest of making the discussion as concrete as possible, here
> is my draft of an alternative proposal for NAs and masking, based on
> Nathaniel's comments.  Writing it, it seemed to me that Nathaniel is
> right, that the ideas become much clearer when the NA idea and the
> MASK idea are separate.   Please do pitch in for things I may have
> missed or misunderstood:
>
> ###
> A alternative-NEP on masking and missing values
> ###
>
> The principle of this aNEP is to separate the APIs for masking and
> for missing
> values, according to
>
> * The current implementation of masked arrays
> * Nathaniel Smith's proposal.
>
> This discussion is only of the API, and not of the implementation.
>
> **
> Initialization
> **
>
> First, missing values can be set and be displayed as ``np.NA, NA``::
>
>  >>> np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]')
> array([1., 2., NA, 7.], dtype='NA[
> As the initialization is not ambiguous, this can be written without
> the NA
> dtype::
>
>  >>> np.array([1.0, 2.0, np.NA, 7.0])
> array([1., 2., NA, 7.], dtype='NA[
> Masked values can be set and be displayed as ``np.MASKED, MASKED``::
>
>  >>> np.array([1.0, 2.0, np.MASKED, 7.0], masked=True)
> array([1., 2., MASKED, 7.], masked=True)
>
> As the initialization is not ambiguous, this can be written without
> ``masked=True``::
>
>  >>> np.array([1.0, 2.0, np.MASKED, 7.0])
> array([1., 2., MASKED, 7.], masked=True)
>
> **
> Ufuncs
> **
>
> By default, NA values propagate::
>
>  >>> na_arr = np.array([1.0, 2.0, np.NA, 7.0])
>  >>> np.sum(na_arr)
> NA('float64')
>
> unless the ``skipna`` flag is set::
>
>  >>> np.sum(na_arr, skipna=True)
> 10.0
>
> By default, masking does not propagate::
>
>  >>> masked_arr = np.array([1.0, 2.0, np.MASKED, 7.0])
>  >>> np.sum(masked_arr)
> 10.0
>
> unless the ``propmsk`` flag is set::
>
>  >>> np.sum(masked_arr, propmsk=True)
> MASKED
>
> An array can be masked, and contain NA values::
>
>  >>> both_arr = np.array([1.0, 2.0, np.MASKED, np.NA, 7.0])
>
> In the default case, the behavior is obvious::
>
>  >>> np.sum(both_arr)
> NA('float64')
>
> It's also obvious what to do with ``skipna=True``::
>
>  >>> np.sum(both_arr, skipna=True)
> 10.0
>  >>> np.sum(both_arr, skipna=True, propmsk=True)
> MASKED
>
> To break the tie between NA and MSK, NAs propagate harder::
>
>  >>> np.sum(both_arr, propmsk=True)
> NA('float64')
>
> **
> Assignment
> **
>
> is obvious in the NA case::
>
>  >>> arr = np.array([1.0, 2.0, 7.0])
>  >>> arr[2] = np.NA
> TypeError('dtype does not support NA')
>  >>> na_arr = np.array([1.0, 2.0, 7.0], dtype='NA[f8]')
>  >>> na_arr[2] = np.NA
>  >>> na_arr
> array([1., 2., NA], dtype='NA[
> Direct assignnent in the masked case is magic and confusing, and so
> happens only
> via the mask::
>
>  >>> masked_array = np.array([1.0, 2.0, 7.0], masked=True)
>  >>> masked_arr[2] = np.NA
> TypeError('dtype does not support NA')
>  >>> masked_arr[2] = np.MASKED
> TypeError('float() argument must be a string or a number')
>  >>> masked_arr.visible[2] = False
>  >>> masked_arr
> array([1., 2., MASKED], masked=True)
>
> See y'all,
>
>
> I honestly don't see the problem here. The difference isn't between
> masked_values/missing_values, it is between masked arrays and masked
> views of unmasked arrays. I think the view concept is central to what is
> going on. It may not be what folks are used to, but it strikes me as a
> clarifying advance rather than a mixed up confusion. Admittedly, it
> depends on the numpy centric ability to have views, but views are a
> wonderful thing.

So a) how do you propose that reductions behave?, b) what semantics for 
the []= operator do you propose?

That would clarify why you don't see a problem..

Dag Sverre

_

Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-06-30 Thread Charles R Harris
On Thu, Jun 30, 2011 at 7:31 AM, Matthew Brett wrote:

> Hi,
>
> On Tue, Jun 28, 2011 at 4:06 PM, Nathaniel Smith  wrote:
> > Anyway, it's pretty clear that in this particular case, there are two
> > distinct features that different people want: the missing data
> > feature, and the masked array feature. The more I think about it, the
> > less I see how they can be combined into one dessert topping + floor
> > wax solution. Here are three particular points where they seem to
> > contradict each other:
> ...
> [some proposals]
>
> In the interest of making the discussion as concrete as possible, here
> is my draft of an alternative proposal for NAs and masking, based on
> Nathaniel's comments.  Writing it, it seemed to me that Nathaniel is
> right, that the ideas become much clearer when the NA idea and the
> MASK idea are separate.   Please do pitch in for things I may have
> missed or misunderstood:
>
> ###
> A alternative-NEP on masking and missing values
> ###
>
> The principle of this aNEP is to separate the APIs for masking and for
> missing
> values, according to
>
> * The current implementation of masked arrays
> * Nathaniel Smith's proposal.
>
> This discussion is only of the API, and not of the implementation.
>
> **
> Initialization
> **
>
> First, missing values can be set and be displayed as ``np.NA, NA``::
>
>>>> np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]')
>array([1., 2., NA, 7.], dtype='NA[
> As the initialization is not ambiguous, this can be written without the NA
> dtype::
>
>>>> np.array([1.0, 2.0, np.NA, 7.0])
>array([1., 2., NA, 7.], dtype='NA[
> Masked values can be set and be displayed as ``np.MASKED, MASKED``::
>
>>>> np.array([1.0, 2.0, np.MASKED, 7.0], masked=True)
>array([1., 2., MASKED, 7.], masked=True)
>
> As the initialization is not ambiguous, this can be written without
> ``masked=True``::
>
>>>> np.array([1.0, 2.0, np.MASKED, 7.0])
>array([1., 2., MASKED, 7.], masked=True)
>
> **
> Ufuncs
> **
>
> By default, NA values propagate::
>
>>>> na_arr = np.array([1.0, 2.0, np.NA, 7.0])
>>>> np.sum(na_arr)
>NA('float64')
>
> unless the ``skipna`` flag is set::
>
>>>> np.sum(na_arr, skipna=True)
>10.0
>
> By default, masking does not propagate::
>
>>>> masked_arr = np.array([1.0, 2.0, np.MASKED, 7.0])
>>>> np.sum(masked_arr)
>10.0
>
> unless the ``propmsk`` flag is set::
>
>>>> np.sum(masked_arr, propmsk=True)
>MASKED
>
> An array can be masked, and contain NA values::
>
>>>> both_arr = np.array([1.0, 2.0, np.MASKED, np.NA, 7.0])
>
> In the default case, the behavior is obvious::
>
>>>> np.sum(both_arr)
>NA('float64')
>
> It's also obvious what to do with ``skipna=True``::
>
>>>> np.sum(both_arr, skipna=True)
>10.0
>>>> np.sum(both_arr, skipna=True, propmsk=True)
>MASKED
>
> To break the tie between NA and MSK, NAs propagate harder::
>
>>>> np.sum(both_arr, propmsk=True)
>NA('float64')
>
> **
> Assignment
> **
>
> is obvious in the NA case::
>
>>>> arr = np.array([1.0, 2.0, 7.0])
>>>> arr[2] = np.NA
>TypeError('dtype does not support NA')
>>>> na_arr = np.array([1.0, 2.0, 7.0], dtype='NA[f8]')
>>>> na_arr[2] = np.NA
>>>> na_arr
>array([1., 2., NA], dtype='NA[
> Direct assignnent in the masked case is magic and confusing, and so happens
> only
> via the mask::
>
>>>> masked_array = np.array([1.0, 2.0, 7.0], masked=True)
>>>> masked_arr[2] = np.NA
>TypeError('dtype does not support NA')
>>>> masked_arr[2] = np.MASKED
>TypeError('float() argument must be a string or a number')
>>>> masked_arr.visible[2] = False
>>>> masked_arr
>array([1., 2., MASKED], masked=True)
>
> See y'all,
>
>
I honestly don't see the problem here. The difference isn't between
masked_values/missing_values, it is between masked arrays and masked views
of unmasked arrays. I think the view concept is central to what is going on.
It may not be what folks are used to, but it strikes me as a clarifying
advance rather than a mixed up confusion. Admittedly, it depends on the
numpy centric ability to have views, but views are a wonderful thing.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-06-30 Thread Pierre GM

On Jun 30, 2011, at 3:31 PM, Matthew Brett wrote:
> ###
> A alternative-NEP on masking and missing values
> ###

I like the idea of two different special values, np.NA for missing values, 
np.IGNORE for masked values. np.NA values in an array define what was 
implemented in numpy.ma as a 'hard mask' (where you can't unmask data), while 
np.IGNOREs correspond to the .mask in numpy.ma. Looks fairly non ambiguous that 
way.


> **
> Initialization
> **
> 
> First, missing values can be set and be displayed as ``np.NA, NA``::
> 
 np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]')
>array([1., 2., NA, 7.], dtype='NA[ 
> As the initialization is not ambiguous, this can be written without the NA
> dtype::
> 
 np.array([1.0, 2.0, np.NA, 7.0])
>array([1., 2., NA, 7.], dtype='NA[ 
> Masked values can be set and be displayed as ``np.MASKED, MASKED``::
> 
 np.array([1.0, 2.0, np.MASKED, 7.0], masked=True)
>array([1., 2., MASKED, 7.], masked=True)
> 
> As the initialization is not ambiguous, this can be written without
> ``masked=True``::
> 
 np.array([1.0, 2.0, np.MASKED, 7.0])
>array([1., 2., MASKED, 7.], masked=True)

I'm not happy with this 'masked' parameter, at all. What's the point? Either 
you have np.NAs and/or np.IGNOREs or you don't. I'm probably missing something 
here.


> **
> Ufuncs
> **

All fine.
> 
> **
> Assignment
> **
> 
> is obvious in the NA case::
> 
 arr = np.array([1.0, 2.0, 7.0])
 arr[2] = np.NA
>TypeError('dtype does not support NA')
 na_arr = np.array([1.0, 2.0, 7.0], dtype='NA[f8]')
 na_arr[2] = np.NA
 na_arr
>array([1., 2., NA], dtype='NA[ 
> Direct assignnent in the masked case is magic and confusing, and so happens 
> only
> via the mask::
> 
 masked_array = np.array([1.0, 2.0, 7.0], masked=True)
 masked_arr[2] = np.NA
>TypeError('dtype does not support NA')
 masked_arr[2] = np.MASKED
>TypeError('float() argument must be a string or a number')
 masked_arr.visible[2] = False
 masked_arr
>array([1., 2., MASKED], masked=True)

What about the reverse case ? When you assign a regular value to a 
np.NA/np.IGNORE item ?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-06-30 Thread Matthew Brett
Hi,

On Tue, Jun 28, 2011 at 4:06 PM, Nathaniel Smith  wrote:
> Anyway, it's pretty clear that in this particular case, there are two
> distinct features that different people want: the missing data
> feature, and the masked array feature. The more I think about it, the
> less I see how they can be combined into one dessert topping + floor
> wax solution. Here are three particular points where they seem to
> contradict each other:
...
[some proposals]

In the interest of making the discussion as concrete as possible, here
is my draft of an alternative proposal for NAs and masking, based on
Nathaniel's comments.  Writing it, it seemed to me that Nathaniel is
right, that the ideas become much clearer when the NA idea and the
MASK idea are separate.   Please do pitch in for things I may have
missed or misunderstood:

###
A alternative-NEP on masking and missing values
###

The principle of this aNEP is to separate the APIs for masking and for missing
values, according to

* The current implementation of masked arrays
* Nathaniel Smith's proposal.

This discussion is only of the API, and not of the implementation.

**
Initialization
**

First, missing values can be set and be displayed as ``np.NA, NA``::

>>> np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]')
array([1., 2., NA, 7.], dtype='NA[>> np.array([1.0, 2.0, np.NA, 7.0])
array([1., 2., NA, 7.], dtype='NA[>> np.array([1.0, 2.0, np.MASKED, 7.0], masked=True)
array([1., 2., MASKED, 7.], masked=True)

As the initialization is not ambiguous, this can be written without
``masked=True``::

>>> np.array([1.0, 2.0, np.MASKED, 7.0])
array([1., 2., MASKED, 7.], masked=True)

**
Ufuncs
**

By default, NA values propagate::

>>> na_arr = np.array([1.0, 2.0, np.NA, 7.0])
>>> np.sum(na_arr)
NA('float64')

unless the ``skipna`` flag is set::

>>> np.sum(na_arr, skipna=True)
10.0

By default, masking does not propagate::

>>> masked_arr = np.array([1.0, 2.0, np.MASKED, 7.0])
>>> np.sum(masked_arr)
10.0

unless the ``propmsk`` flag is set::

>>> np.sum(masked_arr, propmsk=True)
MASKED

An array can be masked, and contain NA values::

>>> both_arr = np.array([1.0, 2.0, np.MASKED, np.NA, 7.0])

In the default case, the behavior is obvious::

>>> np.sum(both_arr)
NA('float64')

It's also obvious what to do with ``skipna=True``::

>>> np.sum(both_arr, skipna=True)
10.0
>>> np.sum(both_arr, skipna=True, propmsk=True)
MASKED

To break the tie between NA and MSK, NAs propagate harder::

>>> np.sum(both_arr, propmsk=True)
NA('float64')

**
Assignment
**

is obvious in the NA case::

>>> arr = np.array([1.0, 2.0, 7.0])
>>> arr[2] = np.NA
TypeError('dtype does not support NA')
>>> na_arr = np.array([1.0, 2.0, 7.0], dtype='NA[f8]')
>>> na_arr[2] = np.NA
>>> na_arr
array([1., 2., NA], dtype='NA[>> masked_array = np.array([1.0, 2.0, 7.0], masked=True)
>>> masked_arr[2] = np.NA
TypeError('dtype does not support NA')
>>> masked_arr[2] = np.MASKED
TypeError('float() argument must be a string or a number')
>>> masked_arr.visible[2] = False
>>> masked_arr
array([1., 2., MASKED], masked=True)

See y'all,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion