I recently compared two different approaches to calculating the correlation of 
two variables, and I cannot explain the different results: 

data(cars)
model <- lm(dist~speed,data=cars)
coef(model)
fitted.right <- model$fitted
fitted.wrong <- -17+5*cars$speed


When using the OLS fitted values, the lines below all return the same R2 value:

1-sum((cars$dist-fitted.right)^2)/sum((cars$dist-mean(cars$dist))^2)
cor(cars$dist,fitted.right)^2
(sum((cars$dist-mean(cars$dist))*(fitted.right-mean(fitted.right)))/(49*sd(cars$dist)*sd(fitted.right)))^2


However, when I use my estimated parameters to find the fitted values, 
"fitted.wrong", the first equation returns a much lower R2 value, which I would 
expect since the fit is worse, but the other lines return the same R2 that I 
get when using the OLS fitted values.

1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2)
cor(x=cars$dist,y=fitted.wrong)^2
(sum((cars$dist-mean(cars$dist))*(fitted.wrong-mean(fitted.wrong)))/(49*sd(cars$dist)*sd(fitted.wrong)))^2


I'm sure I'm missing something simple, but can someone explain the difference 
between these two methods of finding R2? Thanks.

Jon
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to