You should use this as an opportunity to introduce the students to the
concepts of unbiasedness and degrees of freedom.
 
With N scores you have N degrees of freedom.  If you know you population
mean you still have N degrees of freedom when you compute the variance,
so you divide the sum of squares by N -- your knowing (rather than
estimating) the mean is most likely if you have at hand the entire
population of scores.  If you do not know the population mean, then you
must estimate it from the sample data.  You then subtract that estimated
mean from each score, square the deviations, and divide by degrees of
freedom, but the degrees of freedom is now N-1 -- you lost one df when
you estimated the population mean from your sample data.
 
The sample variance is an absolutely unbiased estimator of the
population variance (its expected value is exactly equal to the
population mean), but because the distribution of sample variances is
positively skewed, more than half the time the sample variance will be
an underestimate.  This contributes to the leptokurtosis of Student's t,
especially when df is small -- the skewness of the distribution of
sample variances decreases as N increases.
 
Don't be thinking that the standard deviation computed with N-1 in the
denominator is absolutely unbiased -- it is not, but it is less biased
than when dividing by N.  My instructor of statistics in graduate school
made the mistake of thinking that s must be unbiased if s**2 was, and
was rather embarrassed when I demonstrated to him that it is not.
 
Cheers,
 
Karl W.

________________________________

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 08, 2007 2:40 PM
To: Teaching in the Psychological Sciences (TIPS)
Subject: [tips] A statistical crisis or nightmare


Hello,
 
My whole academic life I have been doing the standard deviation
calculation with a denominator of N-1. Everybook I've used has this as
the formula at least for the sample version (as opposed to the
population version), all my notes and powerpoints make reference to
this, and as far as I can see, SPSS uses this formula (and I am using
SPSS in my class).
 
My new stats class is underway and now that I look carefully at the book
(Aron et al) I see that their version of this formula is done with a
denominator of N, which in my opinion causes a underestimation of
dispersion and (for fact) will make my life a living hell if I choose to
go with it.
 
So I am leaning toward instructing the students to ignore this formula
and use mine. It means that I will not be able to use many of the
practice problems in the book (but I have plenty of others to use) and
might cause them some small amount of confusion. I will probably have to
remind them periodically. Am I being selfish or unfair in trying to make
my life easier this way?
 
And can someone tell me why most all statistics books have some feature
or formula that is an idiosyncratic version? It almost seems like things
are done whimsically. I've encountered this with percentiles, stem and
leaf and other concepts. This is just the worst one so far. 
 
Nancy Melucci
Long Beach City College/CSULA
 
 
---
To make changes to your subscription go to:
http://acsun.frostburg.edu/cgi-bin/lyris.pl?enter=tips&text_mode=0&lang=
english



---
To make changes to your subscription go to:
http://acsun.frostburg.edu/cgi-bin/lyris.pl?enter=tips&text_mode=0&lang=english

Reply via email to