Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-04 Thread Daπid
On 2 April 2014 16:06, Sturla Molden sturla.mol...@gmail.com wrote:

 josef.p...@gmail.com wrote:

  pandas came later and thought ddof=1 is worth more than consistency.

 Pandas is a data analysis package. NumPy is a numerical array package.

 I think ddof=1 is justified for Pandas, for consistency with statistical
 software (SPSS et al.)

 For NumPy, there are many computational tasks where the Bessel correction
 is not wanted, so providing a uncorrected result is the correct thing to
 do. NumPy should be a low-level array library that does very little magic.


All this discussion reminds me of the book Numerical Recipes:

if the difference between N and N - 1 ever matters to you, then you
are probably up to no good anyway -- e.g., trying to substantiate a
questionable
hypothesis with marginal data.

For any reasonably sized data set, it is a correction in the second
significant figure.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-04 Thread josef . pktd
On Fri, Apr 4, 2014 at 8:50 AM, Daπid davidmen...@gmail.com wrote:

 On 2 April 2014 16:06, Sturla Molden sturla.mol...@gmail.com wrote:

 josef.p...@gmail.com wrote:

  pandas came later and thought ddof=1 is worth more than consistency.

 Pandas is a data analysis package. NumPy is a numerical array package.

 I think ddof=1 is justified for Pandas, for consistency with statistical
 software (SPSS et al.)

 For NumPy, there are many computational tasks where the Bessel correction
 is not wanted, so providing a uncorrected result is the correct thing to
 do. NumPy should be a low-level array library that does very little magic.


 All this discussion reminds me of the book Numerical Recipes:

 if the difference between N and N − 1 ever matters to you, then you
 are probably up to no good anyway — e.g., trying to substantiate a
 questionable
 hypothesis with marginal data.

 For any reasonably sized data set, it is a correction in the second
 significant figure.

I fully agree, but sometimes you don't have much choice.

`big data` == `statistics with negative degrees of freedom` ?

or maybe

`machine learning` == `statistics with negative degrees of freedom` ?

Josef


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-03 Thread Sturla Molden
alex argri...@ncsu.edu wrote:

 I don't have any opinion about this debate, but I love the
 justification in that thread Any surprise that is created by the
 different default should be mitigated by the fact that it's an
 opportunity to learn something about what you are doing.  

That is so true. 

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-03 Thread Sturla Molden
josef.p...@gmail.com wrote:

 pandas came later and thought ddof=1 is worth more than consistency.

Pandas is a data analysis package. NumPy is a numerical array package.

I think ddof=1 is justified for Pandas, for consistency with statistical
software (SPSS et al.)

For NumPy, there are many computational tasks where the Bessel correction
is not wanted, so providing a uncorrected result is the correct thing to
do. NumPy should be a low-level array library that does very little magic.

Those who need the Bessel correction can multiply with sqrt(n/float(n-1))
or specify ddof. Bu that belongs in the docs.


Sturla

P.S. Personally I am not convinced unbiased is ever a valid argument, as
the biased estimator has smaller error. This is from experience in
marksmanship: I'd rather shoot a tight series with small systematic error
than scatter my bullets wildly but unbiased on the target. It is the
total error that counts. The series with smallest total error gets the best
score. It is better to shoot two series and calibrate the sight in between
than use a calibration-free sight that don't allow us to aim. That's why I
think classical statistics got this one wrong. Unbiased is never a virtue,
but the smallest error is. Thus, if we are to repeat an experiment, we
should calibrate our estimator just like a marksman calibrates his sight.
But the aim should always be calibrated to give the smallest error, not an
unbiased scatter. Noone in their right mind would claim a shotgun is more
precise than a rifle because it has smaller bias. But that is what applying
the Bessel correction implies.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-03 Thread josef . pktd
On Wed, Apr 2, 2014 at 10:06 AM, Sturla Molden sturla.mol...@gmail.com wrote:
 josef.p...@gmail.com wrote:

 pandas came later and thought ddof=1 is worth more than consistency.

 Pandas is a data analysis package. NumPy is a numerical array package.

 I think ddof=1 is justified for Pandas, for consistency with statistical
 software (SPSS et al.)

 For NumPy, there are many computational tasks where the Bessel correction
 is not wanted, so providing a uncorrected result is the correct thing to
 do. NumPy should be a low-level array library that does very little magic.

 Those who need the Bessel correction can multiply with sqrt(n/float(n-1))
 or specify ddof. Bu that belongs in the docs.


 Sturla

 P.S. Personally I am not convinced unbiased is ever a valid argument, as
 the biased estimator has smaller error. This is from experience in
 marksmanship: I'd rather shoot a tight series with small systematic error
 than scatter my bullets wildly but unbiased on the target. It is the
 total error that counts. The series with smallest total error gets the best
 score. It is better to shoot two series and calibrate the sight in between
 than use a calibration-free sight that don't allow us to aim.

calibration == bias correction ?

That's why I
 think classical statistics got this one wrong. Unbiased is never a virtue,
 but the smallest error is. Thus, if we are to repeat an experiment, we
 should calibrate our estimator just like a marksman calibrates his sight.
 But the aim should always be calibrated to give the smallest error, not an
 unbiased scatter. Noone in their right mind would claim a shotgun is more
 precise than a rifle because it has smaller bias. But that is what applying
 the Bessel correction implies.

https://www.youtube.com/watch?v=i4xcEZZDW_I


I spent several days trying to figure out what Stata is doing for
small sample corrections to reduce the bias of the rejection interval
with uncorrected variance estimates.

Josef


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-03 Thread Bago
 Sturla

 P.S. Personally I am not convinced unbiased is ever a valid argument, as
 the biased estimator has smaller error. This is from experience in
 marksmanship: I'd rather shoot a tight series with small systematic error
 than scatter my bullets wildly but unbiased on the target. It is the
 total error that counts. The series with smallest total error gets the best
 score. It is better to shoot two series and calibrate the sight in between
 than use a calibration-free sight that don't allow us to aim. That's why I
 think classical statistics got this one wrong. Unbiased is never a virtue,
 but the smallest error is. Thus, if we are to repeat an experiment, we
 should calibrate our estimator just like a marksman calibrates his sight.
 But the aim should always be calibrated to give the smallest error, not an
 unbiased scatter. Noone in their right mind would claim a shotgun is more
 precise than a rifle because it has smaller bias. But that is what applying
 the Bessel correction implies.


I agree with the point, and what makes it even worse is that ddof=1 does
not even produce an unbiased standard deviation estimate. I produces an
unbiased variance estimate but the sqrt of this variance estimate is a
biased standard deviation estimate,
http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation.

Bago
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-03 Thread josef . pktd
On Thu, Apr 3, 2014 at 2:21 PM, Bago mrb...@gmail.com wrote:




 Sturla

 P.S. Personally I am not convinced unbiased is ever a valid argument, as
 the biased estimator has smaller error. This is from experience in
 marksmanship: I'd rather shoot a tight series with small systematic error
 than scatter my bullets wildly but unbiased on the target. It is the
 total error that counts. The series with smallest total error gets the
 best
 score. It is better to shoot two series and calibrate the sight in between
 than use a calibration-free sight that don't allow us to aim. That's why I
 think classical statistics got this one wrong. Unbiased is never a virtue,
 but the smallest error is. Thus, if we are to repeat an experiment, we
 should calibrate our estimator just like a marksman calibrates his sight.
 But the aim should always be calibrated to give the smallest error, not an
 unbiased scatter. Noone in their right mind would claim a shotgun is more
 precise than a rifle because it has smaller bias. But that is what
 applying
 the Bessel correction implies.


 I agree with the point, and what makes it even worse is that ddof=1 does not
 even produce an unbiased standard deviation estimate. I produces an unbiased
 variance estimate but the sqrt of this variance estimate is a biased
 standard deviation estimate,
 http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation.

But ddof=1 still produces a smaller bias than ddof=0

I think the main point in stats is that without ddof, the variance
will be too small and t-test or similar will be liberal in small
samples, or confidence intervals will be too short.
(for statisticians that prefer to have tests that maintain their level
and prefer to err on the conservative side.)

Josef



 Bago

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread Haslwanter Thomas
While most other Python applications (scipy, pandas) use for the calculation of 
the standard deviation the default ddof=1 (i.e. they calculate the sample 
standard deviation), the Numpy implementation uses the default ddof=0.
Personally I cannot think of many applications where it would be desired to 
calculate the standard deviation with ddof=0. In addition, I feel that there 
should be consistency between standard modules such as numpy, scipy, and pandas.

I am wondering if there is a good reason to stick to ddof=0 as the default 
for std, or if others would agree with my suggestion to change the default to 
ddof=1?

Thomas

---
Prof. (FH) PD Dr. Thomas Haslwanter
School of Applied Health and Social Sciences
University of Applied Sciences Upper Austria
FH OÖ Studienbetriebs GmbH
Garnisonstraße 21
4020 Linz/Austria
Tel.: +43 (0)5 0804 -52170
Fax: +43 (0)5 0804 -52171
E-Mail: thomas.haslwan...@fh-linz.atmailto:thomas.haslwan...@fh-linz.at
Web: me-research.fh-linz.athttp://work.thaslwanter.at
or work.thaslwanter.at

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread Benjamin Root
Because np.mean() is ddof=0? (I mean effectively, not that it actually has
a parameter for that) There is consistency within the library, and I
certainly wouldn't want to have NaN all of the sudden coming from my calls
to mean() that I apply to an arbitrary non-empty array of values that
happened to have only one value. So, if we can't change the default for
mean, then it only makes sense to keep np.std() consistent with np.mean().

My 2 cents...
Ben Root



On Tue, Apr 1, 2014 at 2:27 PM, Haslwanter Thomas 
thomas.haslwan...@fh-linz.at wrote:

 While most other Python applications (scipy, pandas) use for the
 calculation of the standard deviation the default ddof=1 (i.e. they
 calculate the sample standard deviation), the Numpy implementation uses the
 default ddof=0.

 Personally I cannot think of many applications where it would be desired
 to calculate the standard deviation with ddof=0. In addition, I feel that
 there should be consistency between standard modules such as numpy, scipy,
 and pandas.



 I am wondering if there is a good reason to stick to ddof=0 as the
 default for std, or if others would agree with my suggestion to change
 the default to ddof=1?



 Thomas



 ---
 Prof. (FH) PD Dr. Thomas Haslwanter
 School of Applied Health and Social Sciences

 *University of Applied Sciences* *Upper Austria*
 *FH OÖ Studienbetriebs GmbH*
 Garnisonstraße 21
 4020 Linz/Austria
 Tel.: +43 (0)5 0804 -52170
 Fax: +43 (0)5 0804 -52171
 E-Mail: thomas.haslwan...@fh-linz.at
 Web: me-research.fh-linz.at http://work.thaslwanter.at
 or work.thaslwanter.at



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread Sturla Molden
Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote:

 Personally I cannot think of many applications where it would be desired
 to calculate the standard deviation with ddof=0. In addition, I feel that
 there should be consistency between standard modules such as numpy, scipy, 
 and pandas.

ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian
estimation.

If you are not eatimating from a sample, but rather calculating for the
whole population, you always want ddof=0. 

What does Matlab do by default? (Yes, it is a retorical question.)


 I am wondering if there is a good reason to stick to ddof=0 as the
 default for std, or if others would agree with my suggestion to change
 the default to ddof=1?

It is a bad idea to suddenly break everyone's code. 


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread Eelco Hoogendoorn
I agree; breaking code over this would be ridiculous. Also, I prefer the
zero default, despite the mean/std combo probably being more common.


On Tue, Apr 1, 2014 at 10:02 PM, Sturla Molden sturla.mol...@gmail.comwrote:

 Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote:

  Personally I cannot think of many applications where it would be desired
  to calculate the standard deviation with ddof=0. In addition, I feel that
  there should be consistency between standard modules such as numpy,
 scipy, and pandas.

 ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian
 estimation.

 If you are not eatimating from a sample, but rather calculating for the
 whole population, you always want ddof=0.

 What does Matlab do by default? (Yes, it is a retorical question.)


  I am wondering if there is a good reason to stick to ddof=0 as the
  default for std, or if others would agree with my suggestion to change
  the default to ddof=1?

 It is a bad idea to suddenly break everyone's code.


 Sturla

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread Nathaniel Smith
On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com wrote:
 Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote:

 Personally I cannot think of many applications where it would be desired
 to calculate the standard deviation with ddof=0. In addition, I feel that
 there should be consistency between standard modules such as numpy, scipy, 
 and pandas.

 ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian
 estimation.

It's true, but the counter-arguments are also strong. And regardless
of whether ddof=1 or ddof=0 is better, surely the same one is better
for both numpy and scipy.

 If you are not eatimating from a sample, but rather calculating for the
 whole population, you always want ddof=0.

 What does Matlab do by default? (Yes, it is a retorical question.)

R (which is probably a more relevant comparison) does do ddof=1 by default.

 I am wondering if there is a good reason to stick to ddof=0 as the
 default for std, or if others would agree with my suggestion to change
 the default to ddof=1?

 It is a bad idea to suddenly break everyone's code.

It would be a disruptive transition, but OTOH having inconsistencies
like this guarantees the ongoing creation of new broken code.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread Ralf Gommers
On Tue, Apr 1, 2014 at 10:08 PM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com
 wrote:
  Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote:
 
  Personally I cannot think of many applications where it would be desired
  to calculate the standard deviation with ddof=0. In addition, I feel
 that
  there should be consistency between standard modules such as numpy,
 scipy, and pandas.
 
  ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian
  estimation.

 It's true, but the counter-arguments are also strong. And regardless
 of whether ddof=1 or ddof=0 is better, surely the same one is better
 for both numpy and scipy.


If we could still choose here without any costs, obviously that's true.
This particular ship sailed a long time ago though. By the way, there isn't
even a `scipy.stats.std`, so we're comparing with differently named
functions (nanstd for example).


   If you are not eatimating from a sample, but rather calculating for the
  whole population, you always want ddof=0.
 
  What does Matlab do by default? (Yes, it is a retorical question.)

 R (which is probably a more relevant comparison) does do ddof=1 by default.

  I am wondering if there is a good reason to stick to ddof=0 as the
  default for std, or if others would agree with my suggestion to change
  the default to ddof=1?
 
  It is a bad idea to suddenly break everyone's code.

 It would be a disruptive transition, but OTOH having inconsistencies
 like this guarantees the ongoing creation of new broken code.


Not much of an argument to change return values for a so heavily used
function.

Ralf



 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread Charles R Harris
On Tue, Apr 1, 2014 at 2:08 PM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com
 wrote:
  Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote:
 
  Personally I cannot think of many applications where it would be desired
  to calculate the standard deviation with ddof=0. In addition, I feel
 that
  there should be consistency between standard modules such as numpy,
 scipy, and pandas.
 
  ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian
  estimation.

 It's true, but the counter-arguments are also strong. And regardless
 of whether ddof=1 or ddof=0 is better, surely the same one is better
 for both numpy and scipy.

  If you are not eatimating from a sample, but rather calculating for the
  whole population, you always want ddof=0.
 
  What does Matlab do by default? (Yes, it is a retorical question.)

 R (which is probably a more relevant comparison) does do ddof=1 by default.

  I am wondering if there is a good reason to stick to ddof=0 as the
  default for std, or if others would agree with my suggestion to change
  the default to ddof=1?
 
  It is a bad idea to suddenly break everyone's code.

 It would be a disruptive transition, but OTOH having inconsistencies
 like this guarantees the ongoing creation of new broken code.


This topic comes up regularly. The original choice was made for numpy 1.0b1
by Travis, see this later
thread.http://thread.gmane.org/gmane.comp.python.numeric.general/25720/focus=25721At
this point it is probably best to leave it alone.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread Nathaniel Smith
On Tue, Apr 1, 2014 at 9:51 PM, Ralf Gommers ralf.gomm...@gmail.com wrote:



 On Tue, Apr 1, 2014 at 10:08 PM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com
 wrote:
  Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote:
 
  Personally I cannot think of many applications where it would be
  desired
  to calculate the standard deviation with ddof=0. In addition, I feel
  that
  there should be consistency between standard modules such as numpy,
  scipy, and pandas.
 
  ddof=0 is the maxiumum likelihood estimate. It is also needed in
  Bayesian
  estimation.

 It's true, but the counter-arguments are also strong. And regardless
 of whether ddof=1 or ddof=0 is better, surely the same one is better
 for both numpy and scipy.

 If we could still choose here without any costs, obviously that's true. This
 particular ship sailed a long time ago though. By the way, there isn't even
 a `scipy.stats.std`, so we're comparing with differently named functions
 (nanstd for example).

Presumably nanstd is a lot less heavily used than std, and presumably
people expect 'nanstd' to be a 'nan' version of 'std' -- what do you
think of changing nanstd to ddof=0 to match numpy? (With appropriate
FutureWarning transition, etc.)

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread alex
On Tue, Apr 1, 2014 at 4:54 PM, Charles R Harris
charlesr.har...@gmail.com wrote:



 On Tue, Apr 1, 2014 at 2:08 PM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com
 wrote:
  Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote:
 
  Personally I cannot think of many applications where it would be
  desired
  to calculate the standard deviation with ddof=0. In addition, I feel
  that
  there should be consistency between standard modules such as numpy,
  scipy, and pandas.
 
  ddof=0 is the maxiumum likelihood estimate. It is also needed in
  Bayesian
  estimation.

 It's true, but the counter-arguments are also strong. And regardless
 of whether ddof=1 or ddof=0 is better, surely the same one is better
 for both numpy and scipy.

  If you are not eatimating from a sample, but rather calculating for the
  whole population, you always want ddof=0.
 
  What does Matlab do by default? (Yes, it is a retorical question.)

 R (which is probably a more relevant comparison) does do ddof=1 by
 default.

  I am wondering if there is a good reason to stick to ddof=0 as the
  default for std, or if others would agree with my suggestion to
  change
  the default to ddof=1?
 
  It is a bad idea to suddenly break everyone's code.

 It would be a disruptive transition, but OTOH having inconsistencies
 like this guarantees the ongoing creation of new broken code.


 This topic comes up regularly. The original choice was made for numpy 1.0b1
 by Travis, see this later thread. At this point it is probably best to leave
 it alone.

I don't have any opinion about this debate, but I love the
justification in that thread Any surprise that is created by the
different default should be mitigated by the fact that it's an
opportunity to learn something about what you are doing.  This
masterpiece of rhetoric will surely help me win many internet
arguments in the future!
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread josef . pktd
On Tue, Apr 1, 2014 at 5:11 PM, Nathaniel Smith n...@pobox.com wrote:
 On Tue, Apr 1, 2014 at 9:51 PM, Ralf Gommers ralf.gomm...@gmail.com wrote:



 On Tue, Apr 1, 2014 at 10:08 PM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com
 wrote:
  Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote:
 
  Personally I cannot think of many applications where it would be
  desired
  to calculate the standard deviation with ddof=0. In addition, I feel
  that
  there should be consistency between standard modules such as numpy,
  scipy, and pandas.
 
  ddof=0 is the maxiumum likelihood estimate. It is also needed in
  Bayesian
  estimation.

 It's true, but the counter-arguments are also strong. And regardless
 of whether ddof=1 or ddof=0 is better, surely the same one is better
 for both numpy and scipy.

 If we could still choose here without any costs, obviously that's true. This
 particular ship sailed a long time ago though. By the way, there isn't even
 a `scipy.stats.std`, so we're comparing with differently named functions
 (nanstd for example).

 Presumably nanstd is a lot less heavily used than std, and presumably
 people expect 'nanstd' to be a 'nan' version of 'std' -- what do you
 think of changing nanstd to ddof=0 to match numpy? (With appropriate
 FutureWarning transition, etc.)

numpy is numpy, a numerical library
scipy.stats is stats and behaves differently.  (axis=0)

nanstd in scipy.stats will hopefully also go away soon, so I don't
think it's worth changing there either.

pandas came later and thought ddof=1 is worth more than consistency.

I don't think ddof defaults's are worth jumping through deprecation hoops.

(bias in cov, corrcoef is non-standard ddof)

Josef



 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion