Re: [Numpy-discussion] Nansum function behavior

2015-10-23 Thread Juan Nunez-Iglesias
Hi Charles,


Just providing an outsider's perspective...




Your specific use-case doesn't address the general definition of nansum: 
perform a sum while ignoring nans. As others have pointed out, (especially in 
the linked thread) the sum of nothing is 0. Although the current behaviour of 
nansum doesn't quite match your use-case, there is no doubt at all that it 
follows a consistent convention. "Wrong" is certainly not the correct way to 
describe it.




You can easily cater to your use case as follows:




def rilhac_nansum(ar, axis=None):

    if axis is None:

        return np.nanmean(ar)

    else:

        return np.nanmean(ar, axis=axis) * ar.shape[axis]




nanmean _consistently_ returns nans when encountering nan-only values because 
the mean of nothing is nan (the sum of nothing divided by the length of 
nothing, ie 0/0).




Hope this helps...




Juan.

On Sat, Oct 24, 2015 at 12:44 PM, Charles Rilhac 
wrote:

> I saw this thread and I totally disagree with thouis argument…
> Of course, you can have NaN if there are only NaNs. Thanks goodness, There is 
> a lot of way to do that. 
> But it’s not convenient, consistent and above all, it is wrong logically to 
> do that. NaN does not mean zeros and operation with NaN only cannot return a 
> figure…
> You lose information about your array. It is easier to fill the result of 
> nansum with zeros than to keep a mask of your orignal array or whatever you 
> do.
> Why it’s misleading ? 
> For example you want to sum rows of a array and mean the result :
> a = np.array([[2,np.nan,4], [np.nan,np.nan, np.nan]])
> b = np.nansum(a, axis=1) # array([ 6.,  0.])
> m = np.nanmean(b) # 3.0 WRONG because you wanted to get 6
>> On 24 Oct 2015, at 09:28, Stephan Hoyer  wrote:
>> 
>> Hi Charles,
>> 
>> You should read the previous discussion about this issue on GitHub:
>> https://github.com/numpy/numpy/issues/1721
>> 
>> For what it's worth, I do think the new definition of nansum is more 
>> consistent.
>> 
>> If you want to preserve NaN if there are no non-NaN values, you can often 
>> calculate this desired quantity from nanmean, which does return NaN if there 
>> are only NaNs.
>> 
>> Stephan
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Nansum function behavior

2015-10-23 Thread Charles Rilhac
I saw this thread and I totally disagree with thouis argument…
Of course, you can have NaN if there are only NaNs. Thanks goodness, There is a 
lot of way to do that. 
But it’s not convenient, consistent and above all, it is wrong logically to do 
that. NaN does not mean zeros and operation with NaN only cannot return a 
figure…
You lose information about your array. It is easier to fill the result of 
nansum with zeros than to keep a mask of your orignal array or whatever you do.

Why it’s misleading ? 
For example you want to sum rows of a array and mean the result :

a = np.array([[2,np.nan,4], [np.nan,np.nan, np.nan]])
b = np.nansum(a, axis=1) # array([ 6.,  0.])
m = np.nanmean(b) # 3.0 WRONG because you wanted to get 6

> On 24 Oct 2015, at 09:28, Stephan Hoyer  wrote:
> 
> Hi Charles,
> 
> You should read the previous discussion about this issue on GitHub:
> https://github.com/numpy/numpy/issues/1721
> 
> For what it's worth, I do think the new definition of nansum is more 
> consistent.
> 
> If you want to preserve NaN if there are no non-NaN values, you can often 
> calculate this desired quantity from nanmean, which does return NaN if there 
> are only NaNs.
> 
> Stephan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Nansum function behavior

2015-10-23 Thread Stephan Hoyer
Hi Charles,

You should read the previous discussion about this issue on GitHub:
https://github.com/numpy/numpy/issues/1721

For what it's worth, I do think the new definition of nansum is more
consistent.

If you want to preserve NaN if there are no non-NaN values, you can often
calculate this desired quantity from nanmean, which does return NaN if
there are only NaNs.

Stephan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Nansum function behavior

2015-10-23 Thread Charles Rilhac
Why do we keep this behaviour ? : 
np.nansum([np.nan]) # zero

Firstly, you lose information. 
You can easily fill nan with zero after applying nansum but you cannot keep nan 
for nan-full rows if you doesn’t have a mask or keep the information about 
nan-full row before.
It is not convenient, useful.
Secondly, it is illogical. A arithmetic operation or whatever else between 
Nothing and Nothing cannot return Something.
We can accept that Nothing + Object = Object but we cannot get a figure from 
nothing. It is counterintuitive. I really disagree with this change happened 
few years ago.


> On 24 Oct 2015, at 01:11, Benjamin Root  wrote:
> 
> The change to nansum() happened several years ago. The main thrust of it was 
> to make the following consistent:
> 
> np.sum([])  # zero
> np.nansum([np.nan])  # zero
> np.sum([1])  # one
> np.nansum([np.nan, 1])  # one
> 
> If you want to propagate masks and such, use masked arrays.
> Ben Root
> 
> 
> On Fri, Oct 23, 2015 at 12:45 PM, Charles Rilhac  > wrote:
> Hello,
> 
> I noticed the change regarding nan function and especially nansum function. I 
> think this choice is a big mistake. I know that Matlab and R have made this 
> choice but it is illogical and counterintuitive.
> 
> First argument is about logic. An arithmetic operation between Nothing and 
> Nothing cannot make a figure or an object. Nothing + Object can be an object 
> or something else, but from nothing, it cannot ensue something else than 
> nothing. I hope you see what I mean.
> 
> Secondly, it's counterintuitive and not convenient. Because, if you want to 
> fill the result of nanfunction you can do that easily :
> 
> a = np.array([[np.nan, np.nan], [1,np.nan]])
> a = np.nansum(a, axis=1)
> print(a)
> array([np.nan,  1.])
> a[np.isnan(a)] = 0
> Whereas, if the result is already filled with zero on NaN-full rows, you 
> cannot replace the result of NaN-full rows by NaN easily. In the case above, 
> you cannot because you lost information about NaN-full rows.
> 
> I know it is tough to come back to a previous stage but I really think that 
> it is wrong to absolutely fill with zeros the result of arithmetic operation 
> containing NaN.
> 
> Thank for your work guys ;-)
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org 
> https://mail.scipy.org/mailman/listinfo/numpy-discussion 
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] deprecate fromstring() for text reading?

2015-10-23 Thread Chris Barker - NOAA Federal
Grabbing the pandas csv reader would be great, and I hope it happens sooner
than later, though alas, I haven't the spare cycles for it either.

In the meantime though, can we put a deprecation Warning in when using
fromstring() on text files? It's really pretty broken.

-Chris

On Oct 23, 2015, at 4:02 PM, Jeff Reback  wrote:



On Oct 23, 2015, at 6:49 PM, Nathaniel Smith  wrote:

On Oct 23, 2015 3:30 PM, "Jeff Reback"  wrote:
>
> On Oct 23, 2015, at 6:13 PM, Charles R Harris 
wrote:
>
>>
>>
>> On Thu, Oct 22, 2015 at 5:47 PM, Chris Barker - NOAA Federal <
chris.bar...@noaa.gov> wrote:
>>>
>>>
 I think it would be good to keep the usage to read binary data at
least.
>>>
>>>
>>> Agreed -- it's only the text file reading I'm proposing to deprecate.
It was kind of weird to cram it in there in the first place.
>>>
>>> Oh, fromfile() has the same issues.
>>>
>>> Chris
>>>
>>>
 Or is there a good alternative to `np.fromstring(,
dtype=...)`?  -- Marten

 On Thu, Oct 22, 2015 at 1:03 PM, Chris Barker 
wrote:
>
> There was just a question about a bug/issue with scipy.fromstring
(which is numpy.fromstring) when used to read integers from a text file.
>
> https://mail.scipy.org/pipermail/scipy-user/2015-October/036746.html
>
> fromstring() is bugging and inflexible for reading text files -- and
it is a very, very ugly mess of code. I dug into it a while back, and gave
up -- just to much of a mess!
>
> So we really should completely re-implement it, or deprecate it. I
doubt anyone is going to do a big refactor, so that means deprecating it.
>
> Also -- if we do want a fast read numbers from text files function
(which would be nice, actually), it really should get a new name anyway.
>
> (and the hopefully coming new dtype system would make it easier to
write cleanly)
>
> I'm not sure what deprecating something means, though -- have it
raise a deprecation warning in the next version?
>
>>
>> There was discussion at SciPy 2015 of separating out the text reading
abilities of Pandas so that numpy could include it. We should contact Jeff
Rebeck and see about moving that forward.
>
>
> IIRC Thomas Caswell was interested in doing this :)

When he was in Berkeley a few weeks ago he assured me that every night
since SciPy he has dutifully been feeling guilty about not having done it
yet. I think this week his paltry excuse is that he's "on his honeymoon" or
something.

...which is to say that if someone has some spare cycles to take this over
then I think that might be a nice wedding present for him :-).

(The basic idea is to take the text reading backend behind pandas.read_csv
and extract it into a standalone package that pandas could depend on, and
that could also be used by other packages like numpy (among others -- I
thing dato's SFrame package has a fork of this code as well?))

-n

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


I can certainly provide guidance on how/what to extract but don't have
spare cycles myself for this :(

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] deprecate fromstring() for text reading?

2015-10-23 Thread Jeff Reback


> On Oct 23, 2015, at 6:49 PM, Nathaniel Smith  wrote:
> 
> On Oct 23, 2015 3:30 PM, "Jeff Reback"  wrote:
> >
> > On Oct 23, 2015, at 6:13 PM, Charles R Harris  
> > wrote:
> >
> >>
> >>
> >> On Thu, Oct 22, 2015 at 5:47 PM, Chris Barker - NOAA Federal 
> >>  wrote:
> >>>
> >>>
>  I think it would be good to keep the usage to read binary data at least.
> >>>
> >>>
> >>> Agreed -- it's only the text file reading I'm proposing to deprecate. It 
> >>> was kind of weird to cram it in there in the first place.
> >>>
> >>> Oh, fromfile() has the same issues.
> >>>
> >>> Chris
> >>>
> >>>
>  Or is there a good alternative to `np.fromstring(, dtype=...)`?  
>  -- Marten
> 
>  On Thu, Oct 22, 2015 at 1:03 PM, Chris Barker  
>  wrote:
> >
> > There was just a question about a bug/issue with scipy.fromstring 
> > (which is numpy.fromstring) when used to read integers from a text file.
> >
> > https://mail.scipy.org/pipermail/scipy-user/2015-October/036746.html
> >
> > fromstring() is bugging and inflexible for reading text files -- and it 
> > is a very, very ugly mess of code. I dug into it a while back, and gave 
> > up -- just to much of a mess!
> >
> > So we really should completely re-implement it, or deprecate it. I 
> > doubt anyone is going to do a big refactor, so that means deprecating 
> > it.
> >
> > Also -- if we do want a fast read numbers from text files function 
> > (which would be nice, actually), it really should get a new name anyway.
> >
> > (and the hopefully coming new dtype system would make it easier to 
> > write cleanly)
> >
> > I'm not sure what deprecating something means, though -- have it raise 
> > a deprecation warning in the next version?
> >
> >>
> >> There was discussion at SciPy 2015 of separating out the text reading 
> >> abilities of Pandas so that numpy could include it. We should contact Jeff 
> >> Rebeck and see about moving that forward.
> >
> >
> > IIRC Thomas Caswell was interested in doing this :)
> 
> When he was in Berkeley a few weeks ago he assured me that every night since 
> SciPy he has dutifully been feeling guilty about not having done it yet. I 
> think this week his paltry excuse is that he's "on his honeymoon" or 
> something.
> 
> ...which is to say that if someone has some spare cycles to take this over 
> then I think that might be a nice wedding present for him :-).
> 
> (The basic idea is to take the text reading backend behind pandas.read_csv 
> and extract it into a standalone package that pandas could depend on, and 
> that could also be used by other packages like numpy (among others -- I thing 
> dato's SFrame package has a fork of this code as well?))
> 
> -n
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

I can certainly provide guidance on how/what to extract but don't have spare 
cycles myself for this :(___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] deprecate fromstring() for text reading?

2015-10-23 Thread Nathaniel Smith
On Oct 23, 2015 3:30 PM, "Jeff Reback"  wrote:
>
> On Oct 23, 2015, at 6:13 PM, Charles R Harris 
wrote:
>
>>
>>
>> On Thu, Oct 22, 2015 at 5:47 PM, Chris Barker - NOAA Federal <
chris.bar...@noaa.gov> wrote:
>>>
>>>
 I think it would be good to keep the usage to read binary data at
least.
>>>
>>>
>>> Agreed -- it's only the text file reading I'm proposing to deprecate.
It was kind of weird to cram it in there in the first place.
>>>
>>> Oh, fromfile() has the same issues.
>>>
>>> Chris
>>>
>>>
 Or is there a good alternative to `np.fromstring(,
dtype=...)`?  -- Marten

 On Thu, Oct 22, 2015 at 1:03 PM, Chris Barker 
wrote:
>
> There was just a question about a bug/issue with scipy.fromstring
(which is numpy.fromstring) when used to read integers from a text file.
>
> https://mail.scipy.org/pipermail/scipy-user/2015-October/036746.html
>
> fromstring() is bugging and inflexible for reading text files -- and
it is a very, very ugly mess of code. I dug into it a while back, and gave
up -- just to much of a mess!
>
> So we really should completely re-implement it, or deprecate it. I
doubt anyone is going to do a big refactor, so that means deprecating it.
>
> Also -- if we do want a fast read numbers from text files function
(which would be nice, actually), it really should get a new name anyway.
>
> (and the hopefully coming new dtype system would make it easier to
write cleanly)
>
> I'm not sure what deprecating something means, though -- have it
raise a deprecation warning in the next version?
>
>>
>> There was discussion at SciPy 2015 of separating out the text reading
abilities of Pandas so that numpy could include it. We should contact Jeff
Rebeck and see about moving that forward.
>
>
> IIRC Thomas Caswell was interested in doing this :)

When he was in Berkeley a few weeks ago he assured me that every night
since SciPy he has dutifully been feeling guilty about not having done it
yet. I think this week his paltry excuse is that he's "on his honeymoon" or
something.

...which is to say that if someone has some spare cycles to take this over
then I think that might be a nice wedding present for him :-).

(The basic idea is to take the text reading backend behind pandas.read_csv
and extract it into a standalone package that pandas could depend on, and
that could also be used by other packages like numpy (among others -- I
thing dato's SFrame package has a fork of this code as well?))

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] deprecate fromstring() for text reading?

2015-10-23 Thread Jeff Reback



> On Oct 23, 2015, at 6:13 PM, Charles R Harris  
> wrote:
> 
> 
> 
>> On Thu, Oct 22, 2015 at 5:47 PM, Chris Barker - NOAA Federal 
>>  wrote:
>> 
>>> I think it would be good to keep the usage to read binary data at least.
>> 
>> Agreed -- it's only the text file reading I'm proposing to deprecate. It was 
>> kind of weird to cram it in there in the first place.
>> 
>> Oh, fromfile() has the same issues.
>> 
>> Chris
>> 
>> 
>>> Or is there a good alternative to `np.fromstring(, dtype=...)`?  -- 
>>> Marten
>>> 
 On Thu, Oct 22, 2015 at 1:03 PM, Chris Barker  
 wrote:
 There was just a question about a bug/issue with scipy.fromstring (which 
 is numpy.fromstring) when used to read integers from a text file.
 
 https://mail.scipy.org/pipermail/scipy-user/2015-October/036746.html
 
 fromstring() is bugging and inflexible for reading text files -- and it is 
 a very, very ugly mess of code. I dug into it a while back, and gave up -- 
 just to much of a mess!
 
 So we really should completely re-implement it, or deprecate it. I doubt 
 anyone is going to do a big refactor, so that means deprecating it.
 
 Also -- if we do want a fast read numbers from text files function (which 
 would be nice, actually), it really should get a new name anyway.
 
 (and the hopefully coming new dtype system would make it easier to write 
 cleanly)
 
 I'm not sure what deprecating something means, though -- have it raise a 
 deprecation warning in the next version?
> 
> There was discussion at SciPy 2015 of separating out the text reading 
> abilities of Pandas so that numpy could include it. We should contact Jeff 
> Rebeck and see about moving that forward.
> 
> Chuck 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

IIRC Thomas Caswell was interested in doing this :)

Jeff___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] deprecate fromstring() for text reading?

2015-10-23 Thread Charles R Harris
On Thu, Oct 22, 2015 at 5:47 PM, Chris Barker - NOAA Federal <
chris.bar...@noaa.gov> wrote:

>
> I think it would be good to keep the usage to read binary data at least.
>
>
> Agreed -- it's only the text file reading I'm proposing to deprecate. It
> was kind of weird to cram it in there in the first place.
>
> Oh, fromfile() has the same issues.
>
> Chris
>
>
> Or is there a good alternative to `np.fromstring(, dtype=...)`?  --
> Marten
>
> On Thu, Oct 22, 2015 at 1:03 PM, Chris Barker 
> wrote:
>
>> There was just a question about a bug/issue with scipy.fromstring (which
>> is numpy.fromstring) when used to read integers from a text file.
>>
>> https://mail.scipy.org/pipermail/scipy-user/2015-October/036746.html
>>
>> fromstring() is bugging and inflexible for reading text files -- and it
>> is a very, very ugly mess of code. I dug into it a while back, and gave up
>> -- just to much of a mess!
>>
>> So we really should completely re-implement it, or deprecate it. I doubt
>> anyone is going to do a big refactor, so that means deprecating it.
>>
>> Also -- if we do want a fast read numbers from text files function (which
>> would be nice, actually), it really should get a new name anyway.
>>
>> (and the hopefully coming new dtype system would make it easier to write
>> cleanly)
>>
>> I'm not sure what deprecating something means, though -- have it raise a
>> deprecation warning in the next version?
>>
>>
There was discussion at SciPy 2015 of separating out the text reading
abilities of Pandas so that numpy could include it. We should contact Jeff
Rebeck and see about moving that forward.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Nansum function behavior

2015-10-23 Thread Benjamin Root
The change to nansum() happened several years ago. The main thrust of it
was to make the following consistent:

np.sum([])  # zero
np.nansum([np.nan])  # zero
np.sum([1])  # one
np.nansum([np.nan, 1])  # one

If you want to propagate masks and such, use masked arrays.
Ben Root


On Fri, Oct 23, 2015 at 12:45 PM, Charles Rilhac 
wrote:

> Hello,
>
> I noticed the change regarding nan function and especially nansum
> function. I think this choice is a big mistake. I know that Matlab and R
> have made this choice but it is illogical and counterintuitive.
>
> First argument is about logic. An arithmetic operation between Nothing and
> Nothing cannot make a figure or an object. Nothing + Object can be an
> object or something else, but from nothing, it cannot ensue something else
> than nothing. I hope you see what I mean.
>
> Secondly, it's counterintuitive and not convenient. Because, if you want
> to fill the result of nanfunction you can do that easily :
>
> a = np.array([[np.nan, np.nan], [1,np.nan]])
> a = np.nansum(a, axis=1)print(a)
> array([np.nan,  1.])
> a[np.isnan(a)] = 0
>
> Whereas, if the result is already filled with zero on NaN-full rows, you
> cannot replace the result of NaN-full rows by NaN easily. In the case
> above, you cannot because you lost information about NaN-full rows.
>
> I know it is tough to come back to a previous stage but I really think
> that it is wrong to absolutely fill with zeros the result of arithmetic
> operation containing NaN.
> Thank for your work guys ;-)
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Nansum function behavior

2015-10-23 Thread Robert Kern
On Fri, Oct 23, 2015 at 5:45 PM, Charles Rilhac 
wrote:
>
> Hello,
>
> I noticed the change regarding nan function and especially nansum
function. I think this choice is a big mistake. I know that Matlab and R
have made this choice but it is illogical and counterintuitive.

What change are you referring to?

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Nansum function behavior

2015-10-23 Thread Charles Rilhac
Hello,

I noticed the change regarding nan function and especially nansum function. I 
think this choice is a big mistake. I know that Matlab and R have made this 
choice but it is illogical and counterintuitive.

First argument is about logic. An arithmetic operation between Nothing and 
Nothing cannot make a figure or an object. Nothing + Object can be an object or 
something else, but from nothing, it cannot ensue something else than nothing. 
I hope you see what I mean.

Secondly, it's counterintuitive and not convenient. Because, if you want to 
fill the result of nanfunction you can do that easily :

a = np.array([[np.nan, np.nan], [1,np.nan]])
a = np.nansum(a, axis=1)
print(a)
array([np.nan,  1.])
a[np.isnan(a)] = 0
Whereas, if the result is already filled with zero on NaN-full rows, you cannot 
replace the result of NaN-full rows by NaN easily. In the case above, you 
cannot because you lost information about NaN-full rows.

I know it is tough to come back to a previous stage but I really think that it 
is wrong to absolutely fill with zeros the result of arithmetic operation 
containing NaN.

Thank for your work guys ;-)___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion