Joseph LeBouton <lebouton <at> msu.edu> writes: > > Can anyone help me understand why an lm model summary would return an > r.squared of ~0.18 with an intercept term, and an r.squared of ~0.98 > without the intercept? The fit is NOT that much better, according to > plot.lm: residuals are similar between the two models, and a plot of > observed x predicted is almost identical.
There are reasons why the standard textbooks and Bill Venables http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf tell you that removing Intercepts can be dangerous for your health. Dieter ## set.seed(10) x = runif(20,5,10) y = 2 * x + rnorm(20,0,0.3) # a fit with good data summary(lm(y~x))$r.squared # 0.98 # add one outlier at 0 x = c(x,0) y = c(y,20) summary(lm(y~x))$r.squared # 0.00008 # removing the intercept: perfect correlation again summary(lm(y~x-1))$r.squared # 0.91 #... because it is similar to adding MANY data points # at (0,0) x = c(x,rep(0,1000)) y = c(y,rep(0,1000)) summary(lm(y~x))$r.squared # 0.90 ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
