Using df as the denominator of sample variance eliminates the bias
that results if N is used. It reduces, but does not eliminate, the bias in
sample standard deviation as an estimator of population standard deviation.
>From my lecture notes:
Please note that the sample mean is an unbiased estimator of the population
mean and the sample variance, SS / (N - 1), is an unbiased estimator of the
population variance (but SS / N computed on a sample is a biased estimator
of population variance). The sample standard deviation is not, however, an
unbiased estimator of the population standard deviation (it is the least
biased estimator available to us). Consider a hypothetical sampling
distribution for the sample variance where half of the samples have variance
= 2 and half = 4. Since the sample variance is totally unbiased, the
population variance must be the expected value of the sample variances,
.5(2) + .5(4) = 3. Now consider the standard deviations. The expected
value of s is 1.707, but the population standard deviation is 1.732.
This little demo, by the way, saved me hours of boredom when I was a
graduate assistant at Miami Univ. We assistants had to sit in the lecture
hall while the undergrad stat class was taught. One day, half-asleep, I
heard the prof argue that since the sample variance is an unbiased
estimator, then the sample sd must be too. Oops, I thought, that cannot be
so, since the sqrt function is nonlinear. I contrived the argument above,
presented it to the professor after class, and was immediately dismissed
from the requirement to attend classes the rest of the semester.
+++++++++++++++++++++++++++++++++++++++++
Karl L. Wuensch, Department of Psychology,
East Carolina University, Greenville NC 27858-4353
Voice: 252-328-4102 Fax: 252-328-6283
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
http://core.ecu.edu/psyc/wuenschk/klw.htm
<http://core.ecu.edu/psyc/wuenschk/klw.htm>
==========================================
Larry Noted:
The N version of the formula is a descriptive statistic - it simply
describes the current sample (or population). The N-1 version is an
inferential statistic and takes into account degrees of freedom and the fact
that I've estimated the mean of the population.
Larry