G'day Ralf, On Fri, 19 Oct 2007 09:51:37 +0200 Ralf Goertz <[EMAIL PROTECTED]> wrote:
> Thanks to Thomas Lumley there is another convincing example. But still > I've got a problem with it: > > > x<-c(2,3,4);y<-c(2,3,3) > > [...] > That's okay, but neither [...] nor [...] > give the result of summary(lm(y~x+0)), which is 0.9796. Why should either of those formula yield the output of summary(lm(y~x+0)) ? The R-squared output of that command is documented in help(summary.lm): r.squared: R^2, the 'fraction of variance explained by the model', R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2), where y* is the mean of y[i] if there is an intercept and zero otherwise. And, indeed: > 1-sum(residuals(lm(y~x+0))^2)/sum((y-0)^2) [1] 0.9796238 confirms this. Note: if you do not have an intercept in your model, the residuals do not have to add to zero; and, typically, they will not. Hence, var(residuals(lm(y~x+0)) does not give you the residual sum of squares. > In order to save the role of R^2 as a goodness-of-fit indicator R^2 is no goodness-of-fit indicator, neither in models with intercept nor in models without intercept. So I do not see how you can save its role as a goodness-of-fit indicator. :) Since you are posting from a .de domain, I assume you will understand the following quote from Tutz (2000), "Die Analyse kategorialer Daten", page 18: R^2 misst *nicht* die Anpassungsguete des linearen Modelles, es sagt nichts darueber aus, ob der lineare Ansatz wahr oder falsch ist, sondern nur ob durch den linearen Ansatz individuelle Beobachtungen vorhersagbar sind. R^2 wird wesentlich vom Design, d.h. den Werten, die x annimmt bestimmt (vgl. Kockelkorn (1998)). The latter reference is: Kockelkorn, U. (1998). Lineare Modelle. Skript, TU Berlin. > in zero intercept models one could use the same formula like in models > with a constant. I mean, if R^2 is the proportion of variance > explained by the model we should use the a priori variance of y[i]. > > > 1-var(residuals(lm(y~x+0)))/var(y) > [1] 0.3567182 > > But I assume that this has probably been discussed at length somewhere > more appropriate than r-help. I am sure about that, but it was also discussed here on r-help (long ago). The problem is that this compares two models that are not nested in each other which is a quite controversial thing to do; some might even go so far as saying that it makes no sense at all. The other problem with this approaches is illustrated by my example: > set.seed(20070807) > x <- runif(100)*2+10 > y <- 4+rnorm(x, sd=1) > 1-var(residuals(lm(y~x+0)))/var(y) [1] -0.04848273 How do you explain that a quantity that is called R-squared, implying that it is the square of something, hence always non-negative, can become negative? Cheers, Berwin =========================== Full address ============================= Berwin A Turlach Tel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability +65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546 http://www.stat.nus.edu.sg/~statba ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.