Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-04 Thread josef . pktd
On Fri, Apr 4, 2014 at 8:50 AM, Daπid  wrote:
>
> On 2 April 2014 16:06, Sturla Molden  wrote:
>>
>>  wrote:
>>
>> > pandas came later and thought ddof=1 is worth more than consistency.
>>
>> Pandas is a data analysis package. NumPy is a numerical array package.
>>
>> I think ddof=1 is justified for Pandas, for consistency with statistical
>> software (SPSS et al.)
>>
>> For NumPy, there are many computational tasks where the Bessel correction
>> is not wanted, so providing a uncorrected result is the correct thing to
>> do. NumPy should be a low-level array library that does very little magic.
>
>
> All this discussion reminds me of the book "Numerical Recipes":
>
> "if the difference between N and N − 1 ever matters to you, then you
> are probably up to no good anyway — e.g., trying to substantiate a
> questionable
> hypothesis with marginal data."
>
> For any reasonably sized data set, it is a correction in the second
> significant figure.

I fully agree, but sometimes you don't have much choice.

`big data` == `statistics with negative degrees of freedom` ?

or maybe

`machine learning` == `statistics with negative degrees of freedom` ?

Josef

>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-04 Thread Daπid
On 2 April 2014 16:06, Sturla Molden  wrote:

>  wrote:
>
> > pandas came later and thought ddof=1 is worth more than consistency.
>
> Pandas is a data analysis package. NumPy is a numerical array package.
>
> I think ddof=1 is justified for Pandas, for consistency with statistical
> software (SPSS et al.)
>
> For NumPy, there are many computational tasks where the Bessel correction
> is not wanted, so providing a uncorrected result is the correct thing to
> do. NumPy should be a low-level array library that does very little magic.


All this discussion reminds me of the book "Numerical Recipes":

"if the difference between N and N - 1 ever matters to you, then you
are probably up to no good anyway -- e.g., trying to substantiate a
questionable
hypothesis with marginal data."

For any reasonably sized data set, it is a correction in the second
significant figure.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-03 Thread josef . pktd
On Thu, Apr 3, 2014 at 2:21 PM, Bago  wrote:
>
>
>
>>
>> Sturla
>>
>> P.S. Personally I am not convinced "unbiased" is ever a valid argument, as
>> the biased estimator has smaller error. This is from experience in
>> marksmanship: I'd rather shoot a tight series with small systematic error
>> than scatter my bullets wildly but "unbiased" on the target. It is the
>> total error that counts. The series with smallest total error gets the
>> best
>> score. It is better to shoot two series and calibrate the sight in between
>> than use a calibration-free sight that don't allow us to aim. That's why I
>> think classical statistics got this one wrong. Unbiased is never a virtue,
>> but the smallest error is. Thus, if we are to repeat an experiment, we
>> should calibrate our estimator just like a marksman calibrates his sight.
>> But the aim should always be calibrated to give the smallest error, not an
>> unbiased scatter. Noone in their right mind would claim a shotgun is more
>> precise than a rifle because it has smaller bias. But that is what
>> applying
>> the Bessel correction implies.
>>
>
> I agree with the point, and what makes it even worse is that ddof=1 does not
> even produce an unbiased standard deviation estimate. I produces an unbiased
> variance estimate but the sqrt of this variance estimate is a biased
> standard deviation estimate,
> http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation.

But ddof=1 still produces a smaller bias than ddof=0

I think the main point in stats is that without ddof, the variance
will be too small and t-test or similar will be liberal in small
samples, or confidence intervals will be too short.
(for statisticians that prefer to have tests that maintain their level
and prefer to err on the "conservative" side.)

Josef


>
> Bago
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-03 Thread Bago
> Sturla
>
> P.S. Personally I am not convinced "unbiased" is ever a valid argument, as
> the biased estimator has smaller error. This is from experience in
> marksmanship: I'd rather shoot a tight series with small systematic error
> than scatter my bullets wildly but "unbiased" on the target. It is the
> total error that counts. The series with smallest total error gets the best
> score. It is better to shoot two series and calibrate the sight in between
> than use a calibration-free sight that don't allow us to aim. That's why I
> think classical statistics got this one wrong. Unbiased is never a virtue,
> but the smallest error is. Thus, if we are to repeat an experiment, we
> should calibrate our estimator just like a marksman calibrates his sight.
> But the aim should always be calibrated to give the smallest error, not an
> unbiased scatter. Noone in their right mind would claim a shotgun is more
> precise than a rifle because it has smaller bias. But that is what applying
> the Bessel correction implies.
>
>
I agree with the point, and what makes it even worse is that ddof=1 does
not even produce an unbiased standard deviation estimate. I produces an
unbiased variance estimate but the sqrt of this variance estimate is a
biased standard deviation estimate,
http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation.

Bago
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-03 Thread josef . pktd
On Wed, Apr 2, 2014 at 10:06 AM, Sturla Molden  wrote:
>  wrote:
>
>> pandas came later and thought ddof=1 is worth more than consistency.
>
> Pandas is a data analysis package. NumPy is a numerical array package.
>
> I think ddof=1 is justified for Pandas, for consistency with statistical
> software (SPSS et al.)
>
> For NumPy, there are many computational tasks where the Bessel correction
> is not wanted, so providing a uncorrected result is the correct thing to
> do. NumPy should be a low-level array library that does very little magic.
>
> Those who need the Bessel correction can multiply with sqrt(n/float(n-1))
> or specify ddof. Bu that belongs in the docs.
>
>
> Sturla
>
> P.S. Personally I am not convinced "unbiased" is ever a valid argument, as
> the biased estimator has smaller error. This is from experience in
> marksmanship: I'd rather shoot a tight series with small systematic error
> than scatter my bullets wildly but "unbiased" on the target. It is the
> total error that counts. The series with smallest total error gets the best
> score. It is better to shoot two series and calibrate the sight in between
> than use a calibration-free sight that don't allow us to aim.

calibration == bias correction ?

That's why I
> think classical statistics got this one wrong. Unbiased is never a virtue,
> but the smallest error is. Thus, if we are to repeat an experiment, we
> should calibrate our estimator just like a marksman calibrates his sight.
> But the aim should always be calibrated to give the smallest error, not an
> unbiased scatter. Noone in their right mind would claim a shotgun is more
> precise than a rifle because it has smaller bias. But that is what applying
> the Bessel correction implies.

https://www.youtube.com/watch?v=i4xcEZZDW_I


I spent several days trying to figure out what Stata is doing for
small sample corrections to reduce the bias of the rejection interval
with "uncorrected" variance estimates.

Josef

>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-03 Thread Sturla Molden
 wrote:

> pandas came later and thought ddof=1 is worth more than consistency.

Pandas is a data analysis package. NumPy is a numerical array package.

I think ddof=1 is justified for Pandas, for consistency with statistical
software (SPSS et al.)

For NumPy, there are many computational tasks where the Bessel correction
is not wanted, so providing a uncorrected result is the correct thing to
do. NumPy should be a low-level array library that does very little magic.

Those who need the Bessel correction can multiply with sqrt(n/float(n-1))
or specify ddof. Bu that belongs in the docs.


Sturla

P.S. Personally I am not convinced "unbiased" is ever a valid argument, as
the biased estimator has smaller error. This is from experience in
marksmanship: I'd rather shoot a tight series with small systematic error
than scatter my bullets wildly but "unbiased" on the target. It is the
total error that counts. The series with smallest total error gets the best
score. It is better to shoot two series and calibrate the sight in between
than use a calibration-free sight that don't allow us to aim. That's why I
think classical statistics got this one wrong. Unbiased is never a virtue,
but the smallest error is. Thus, if we are to repeat an experiment, we
should calibrate our estimator just like a marksman calibrates his sight.
But the aim should always be calibrated to give the smallest error, not an
unbiased scatter. Noone in their right mind would claim a shotgun is more
precise than a rifle because it has smaller bias. But that is what applying
the Bessel correction implies.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-03 Thread Sturla Molden
alex  wrote:

> I don't have any opinion about this debate, but I love the
> justification in that thread "Any surprise that is created by the
> different default should be mitigated by the fact that it's an
> opportunity to learn something about what you are doing."  

That is so true. 

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-01 Thread josef . pktd
On Tue, Apr 1, 2014 at 5:11 PM, Nathaniel Smith  wrote:
> On Tue, Apr 1, 2014 at 9:51 PM, Ralf Gommers  wrote:
>>
>>
>>
>> On Tue, Apr 1, 2014 at 10:08 PM, Nathaniel Smith  wrote:
>>>
>>> On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden 
>>> wrote:
>>> > Haslwanter Thomas  wrote:
>>> >
>>> >> Personally I cannot think of many applications where it would be
>>> >> desired
>>> >> to calculate the standard deviation with ddof=0. In addition, I feel
>>> >> that
>>> >> there should be consistency between standard modules such as numpy,
>>> >> scipy, and pandas.
>>> >
>>> > ddof=0 is the maxiumum likelihood estimate. It is also needed in
>>> > Bayesian
>>> > estimation.
>>>
>>> It's true, but the counter-arguments are also strong. And regardless
>>> of whether ddof=1 or ddof=0 is better, surely the same one is better
>>> for both numpy and scipy.
>>
>> If we could still choose here without any costs, obviously that's true. This
>> particular ship sailed a long time ago though. By the way, there isn't even
>> a `scipy.stats.std`, so we're comparing with differently named functions
>> (nanstd for example).
>
> Presumably nanstd is a lot less heavily used than std, and presumably
> people expect 'nanstd' to be a 'nan' version of 'std' -- what do you
> think of changing nanstd to ddof=0 to match numpy? (With appropriate
> FutureWarning transition, etc.)

numpy is numpy, a numerical library
scipy.stats is stats and behaves differently.  (axis=0)

nanstd in scipy.stats will hopefully also go away soon, so I don't
think it's worth changing there either.

pandas came later and thought ddof=1 is worth more than consistency.

I don't think ddof defaults's are worth jumping through deprecation hoops.

(bias in cov, corrcoef is "non-standard" ddof)

Josef


>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-01 Thread alex
On Tue, Apr 1, 2014 at 4:54 PM, Charles R Harris
 wrote:
>
>
>
> On Tue, Apr 1, 2014 at 2:08 PM, Nathaniel Smith  wrote:
>>
>> On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden 
>> wrote:
>> > Haslwanter Thomas  wrote:
>> >
>> >> Personally I cannot think of many applications where it would be
>> >> desired
>> >> to calculate the standard deviation with ddof=0. In addition, I feel
>> >> that
>> >> there should be consistency between standard modules such as numpy,
>> >> scipy, and pandas.
>> >
>> > ddof=0 is the maxiumum likelihood estimate. It is also needed in
>> > Bayesian
>> > estimation.
>>
>> It's true, but the counter-arguments are also strong. And regardless
>> of whether ddof=1 or ddof=0 is better, surely the same one is better
>> for both numpy and scipy.
>>
>> > If you are not eatimating from a sample, but rather calculating for the
>> > whole population, you always want ddof=0.
>> >
>> > What does Matlab do by default? (Yes, it is a retorical question.)
>>
>> R (which is probably a more relevant comparison) does do ddof=1 by
>> default.
>>
>> >> I am wondering if there is a good reason to stick to "ddof=0" as the
>> >> default for "std", or if others would agree with my suggestion to
>> >> change
>> >> the default to "ddof=1"?
>> >
>> > It is a bad idea to suddenly break everyone's code.
>>
>> It would be a disruptive transition, but OTOH having inconsistencies
>> like this guarantees the ongoing creation of new broken code.
>>
>
> This topic comes up regularly. The original choice was made for numpy 1.0b1
> by Travis, see this later thread. At this point it is probably best to leave
> it alone.

I don't have any opinion about this debate, but I love the
justification in that thread "Any surprise that is created by the
different default should be mitigated by the fact that it's an
opportunity to learn something about what you are doing."  This
masterpiece of rhetoric will surely help me win many internet
arguments in the future!
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-01 Thread Nathaniel Smith
On Tue, Apr 1, 2014 at 9:51 PM, Ralf Gommers  wrote:
>
>
>
> On Tue, Apr 1, 2014 at 10:08 PM, Nathaniel Smith  wrote:
>>
>> On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden 
>> wrote:
>> > Haslwanter Thomas  wrote:
>> >
>> >> Personally I cannot think of many applications where it would be
>> >> desired
>> >> to calculate the standard deviation with ddof=0. In addition, I feel
>> >> that
>> >> there should be consistency between standard modules such as numpy,
>> >> scipy, and pandas.
>> >
>> > ddof=0 is the maxiumum likelihood estimate. It is also needed in
>> > Bayesian
>> > estimation.
>>
>> It's true, but the counter-arguments are also strong. And regardless
>> of whether ddof=1 or ddof=0 is better, surely the same one is better
>> for both numpy and scipy.
>
> If we could still choose here without any costs, obviously that's true. This
> particular ship sailed a long time ago though. By the way, there isn't even
> a `scipy.stats.std`, so we're comparing with differently named functions
> (nanstd for example).

Presumably nanstd is a lot less heavily used than std, and presumably
people expect 'nanstd' to be a 'nan' version of 'std' -- what do you
think of changing nanstd to ddof=0 to match numpy? (With appropriate
FutureWarning transition, etc.)

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-01 Thread Charles R Harris
On Tue, Apr 1, 2014 at 2:08 PM, Nathaniel Smith  wrote:

> On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden 
> wrote:
> > Haslwanter Thomas  wrote:
> >
> >> Personally I cannot think of many applications where it would be desired
> >> to calculate the standard deviation with ddof=0. In addition, I feel
> that
> >> there should be consistency between standard modules such as numpy,
> scipy, and pandas.
> >
> > ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian
> > estimation.
>
> It's true, but the counter-arguments are also strong. And regardless
> of whether ddof=1 or ddof=0 is better, surely the same one is better
> for both numpy and scipy.
>
> > If you are not eatimating from a sample, but rather calculating for the
> > whole population, you always want ddof=0.
> >
> > What does Matlab do by default? (Yes, it is a retorical question.)
>
> R (which is probably a more relevant comparison) does do ddof=1 by default.
>
> >> I am wondering if there is a good reason to stick to "ddof=0" as the
> >> default for "std", or if others would agree with my suggestion to change
> >> the default to "ddof=1"?
> >
> > It is a bad idea to suddenly break everyone's code.
>
> It would be a disruptive transition, but OTOH having inconsistencies
> like this guarantees the ongoing creation of new broken code.
>
>
This topic comes up regularly. The original choice was made for numpy 1.0b1
by Travis, see this later
thread.At
this point it is probably best to leave it alone.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-01 Thread Ralf Gommers
On Tue, Apr 1, 2014 at 10:08 PM, Nathaniel Smith  wrote:

> On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden 
> wrote:
> > Haslwanter Thomas  wrote:
> >
> >> Personally I cannot think of many applications where it would be desired
> >> to calculate the standard deviation with ddof=0. In addition, I feel
> that
> >> there should be consistency between standard modules such as numpy,
> scipy, and pandas.
> >
> > ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian
> > estimation.
>
> It's true, but the counter-arguments are also strong. And regardless
> of whether ddof=1 or ddof=0 is better, surely the same one is better
> for both numpy and scipy.
>

If we could still choose here without any costs, obviously that's true.
This particular ship sailed a long time ago though. By the way, there isn't
even a `scipy.stats.std`, so we're comparing with differently named
functions (nanstd for example).


>  > If you are not eatimating from a sample, but rather calculating for the
> > whole population, you always want ddof=0.
> >
> > What does Matlab do by default? (Yes, it is a retorical question.)
>
> R (which is probably a more relevant comparison) does do ddof=1 by default.
>
> >> I am wondering if there is a good reason to stick to "ddof=0" as the
> >> default for "std", or if others would agree with my suggestion to change
> >> the default to "ddof=1"?
> >
> > It is a bad idea to suddenly break everyone's code.
>
> It would be a disruptive transition, but OTOH having inconsistencies
> like this guarantees the ongoing creation of new broken code.
>

Not much of an argument to change return values for a so heavily used
function.

Ralf



> -n
>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-01 Thread Nathaniel Smith
On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden  wrote:
> Haslwanter Thomas  wrote:
>
>> Personally I cannot think of many applications where it would be desired
>> to calculate the standard deviation with ddof=0. In addition, I feel that
>> there should be consistency between standard modules such as numpy, scipy, 
>> and pandas.
>
> ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian
> estimation.

It's true, but the counter-arguments are also strong. And regardless
of whether ddof=1 or ddof=0 is better, surely the same one is better
for both numpy and scipy.

> If you are not eatimating from a sample, but rather calculating for the
> whole population, you always want ddof=0.
>
> What does Matlab do by default? (Yes, it is a retorical question.)

R (which is probably a more relevant comparison) does do ddof=1 by default.

>> I am wondering if there is a good reason to stick to "ddof=0" as the
>> default for "std", or if others would agree with my suggestion to change
>> the default to "ddof=1"?
>
> It is a bad idea to suddenly break everyone's code.

It would be a disruptive transition, but OTOH having inconsistencies
like this guarantees the ongoing creation of new broken code.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-01 Thread Eelco Hoogendoorn
I agree; breaking code over this would be ridiculous. Also, I prefer the
zero default, despite the mean/std combo probably being more common.


On Tue, Apr 1, 2014 at 10:02 PM, Sturla Molden wrote:

> Haslwanter Thomas  wrote:
>
> > Personally I cannot think of many applications where it would be desired
> > to calculate the standard deviation with ddof=0. In addition, I feel that
> > there should be consistency between standard modules such as numpy,
> scipy, and pandas.
>
> ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian
> estimation.
>
> If you are not eatimating from a sample, but rather calculating for the
> whole population, you always want ddof=0.
>
> What does Matlab do by default? (Yes, it is a retorical question.)
>
>
> > I am wondering if there is a good reason to stick to "ddof=0" as the
> > default for "std", or if others would agree with my suggestion to change
> > the default to "ddof=1"?
>
> It is a bad idea to suddenly break everyone's code.
>
>
> Sturla
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-01 Thread Sturla Molden
Haslwanter Thomas  wrote:

> Personally I cannot think of many applications where it would be desired
> to calculate the standard deviation with ddof=0. In addition, I feel that
> there should be consistency between standard modules such as numpy, scipy, 
> and pandas.

ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian
estimation.

If you are not eatimating from a sample, but rather calculating for the
whole population, you always want ddof=0. 

What does Matlab do by default? (Yes, it is a retorical question.)


> I am wondering if there is a good reason to stick to "ddof=0" as the
> default for "std", or if others would agree with my suggestion to change
> the default to "ddof=1"?

It is a bad idea to suddenly break everyone's code. 


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-01 Thread Benjamin Root
Because np.mean() is ddof=0? (I mean effectively, not that it actually has
a parameter for that) There is consistency within the library, and I
certainly wouldn't want to have NaN all of the sudden coming from my calls
to mean() that I apply to an arbitrary non-empty array of values that
happened to have only one value. So, if we can't change the default for
mean, then it only makes sense to keep np.std() consistent with np.mean().

My 2 cents...
Ben Root



On Tue, Apr 1, 2014 at 2:27 PM, Haslwanter Thomas <
thomas.haslwan...@fh-linz.at> wrote:

> While most other Python applications (scipy, pandas) use for the
> calculation of the standard deviation the default "ddof=1" (i.e. they
> calculate the sample standard deviation), the Numpy implementation uses the
> default "ddof=0".
>
> Personally I cannot think of many applications where it would be desired
> to calculate the standard deviation with ddof=0. In addition, I feel that
> there should be consistency between standard modules such as numpy, scipy,
> and pandas.
>
>
>
> I am wondering if there is a good reason to stick to "ddof=0" as the
> default for "std", or if others would agree with my suggestion to change
> the default to "ddof=1"?
>
>
>
> Thomas
>
>
>
> ---
> Prof. (FH) PD Dr. Thomas Haslwanter
> School of Applied Health and Social Sciences
>
> *University of Applied Sciences* *Upper Austria*
> *FH OÖ Studienbetriebs GmbH*
> Garnisonstraße 21
> 4020 Linz/Austria
> Tel.: +43 (0)5 0804 -52170
> Fax: +43 (0)5 0804 -52171
> E-Mail: thomas.haslwan...@fh-linz.at
> Web: me-research.fh-linz.at 
> or work.thaslwanter.at
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

2014-04-01 Thread Haslwanter Thomas
While most other Python applications (scipy, pandas) use for the calculation of 
the standard deviation the default "ddof=1" (i.e. they calculate the sample 
standard deviation), the Numpy implementation uses the default "ddof=0".
Personally I cannot think of many applications where it would be desired to 
calculate the standard deviation with ddof=0. In addition, I feel that there 
should be consistency between standard modules such as numpy, scipy, and pandas.

I am wondering if there is a good reason to stick to "ddof=0" as the default 
for "std", or if others would agree with my suggestion to change the default to 
"ddof=1"?

Thomas

---
Prof. (FH) PD Dr. Thomas Haslwanter
School of Applied Health and Social Sciences
University of Applied Sciences Upper Austria
FH OÖ Studienbetriebs GmbH
Garnisonstraße 21
4020 Linz/Austria
Tel.: +43 (0)5 0804 -52170
Fax: +43 (0)5 0804 -52171
E-Mail: thomas.haslwan...@fh-linz.at
Web: me-research.fh-linz.at
or work.thaslwanter.at

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion