On 26/05/2009 5:50 AM, a...@us.ibm.com wrote:
Full_Name: Amos Waterland
Version: 2.8.1
OS: Ubuntu Linux
Submission from: (NULL) (68.175.8.163)
I calculated the covariance for a small data set as follows:
X <- c(1,2,3,4)
Y <- c(3,3,4,3)
cov(X,Y)
[1] 0.1666667
But when doing the computation with pencil and paper I get:
((-1.5)*(-0.25) + (-0.5)*(-0.25) + (0.5)*(0.75) + (1.5)*(-0.25))/4
[1] 0.125
Microsoft Excel 2003 covar() also gives 0.125. I suspect that you guys are
doing something like this:
((-1.5)*(-0.25) + (-0.5)*(-0.25) + (0.5)*(0.75) + (1.5)*(-0.25))/3
[1] 0.1666667
That is, you are dividing by N minus 1 rather than N. So who is correct?
Please don't claim something is a bug when you are not sure. cov() is
clearly documented to use n-1 in the denominator. Excel (for their own
reasons) uses n, which leads to surprises like var(x) != covar(x, x),
because they use n-1 in their variance calculation.
Duncan Murdoch
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel