You should use this as an opportunity to introduce the students to the concepts of unbiasedness and degrees of freedom. With N scores you have N degrees of freedom. If you know you population mean you still have N degrees of freedom when you compute the variance, so you divide the sum of squares by N -- your knowing (rather than estimating) the mean is most likely if you have at hand the entire population of scores. If you do not know the population mean, then you must estimate it from the sample data. You then subtract that estimated mean from each score, square the deviations, and divide by degrees of freedom, but the degrees of freedom is now N-1 -- you lost one df when you estimated the population mean from your sample data. The sample variance is an absolutely unbiased estimator of the population variance (its expected value is exactly equal to the population mean), but because the distribution of sample variances is positively skewed, more than half the time the sample variance will be an underestimate. This contributes to the leptokurtosis of Student's t, especially when df is small -- the skewness of the distribution of sample variances decreases as N increases. Don't be thinking that the standard deviation computed with N-1 in the denominator is absolutely unbiased -- it is not, but it is less biased than when dividing by N. My instructor of statistics in graduate school made the mistake of thinking that s must be unbiased if s**2 was, and was rather embarrassed when I demonstrated to him that it is not. Cheers, Karl W.
________________________________ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, January 08, 2007 2:40 PM To: Teaching in the Psychological Sciences (TIPS) Subject: [tips] A statistical crisis or nightmare Hello, My whole academic life I have been doing the standard deviation calculation with a denominator of N-1. Everybook I've used has this as the formula at least for the sample version (as opposed to the population version), all my notes and powerpoints make reference to this, and as far as I can see, SPSS uses this formula (and I am using SPSS in my class). My new stats class is underway and now that I look carefully at the book (Aron et al) I see that their version of this formula is done with a denominator of N, which in my opinion causes a underestimation of dispersion and (for fact) will make my life a living hell if I choose to go with it. So I am leaning toward instructing the students to ignore this formula and use mine. It means that I will not be able to use many of the practice problems in the book (but I have plenty of others to use) and might cause them some small amount of confusion. I will probably have to remind them periodically. Am I being selfish or unfair in trying to make my life easier this way? And can someone tell me why most all statistics books have some feature or formula that is an idiosyncratic version? It almost seems like things are done whimsically. I've encountered this with percentiles, stem and leaf and other concepts. This is just the worst one so far. Nancy Melucci Long Beach City College/CSULA --- To make changes to your subscription go to: http://acsun.frostburg.edu/cgi-bin/lyris.pl?enter=tips&text_mode=0&lang= english --- To make changes to your subscription go to: http://acsun.frostburg.edu/cgi-bin/lyris.pl?enter=tips&text_mode=0&lang=english
