Joseph LeBouton <lebouton <at> msu.edu> writes:

> 
> Can anyone help me understand why an lm model summary would return an 
> r.squared of ~0.18 with an intercept term, and an r.squared of ~0.98 
> without the intercept?   The fit is NOT that much better, according to 
> plot.lm: residuals are similar between the two models, and a plot of 
> observed x predicted is almost identical.

There are reasons why the standard textbooks and Bill Venables

http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf

tell you that removing Intercepts can be dangerous for your health.

Dieter

##
set.seed(10)
x = runif(20,5,10)
y = 2 * x + rnorm(20,0,0.3)

# a fit with good data
summary(lm(y~x))$r.squared
# 0.98

# add one outlier at 0
x = c(x,0)
y = c(y,20)
summary(lm(y~x))$r.squared
# 0.00008

# removing the intercept: perfect correlation again
summary(lm(y~x-1))$r.squared
# 0.91

#... because it is similar to adding MANY data points
# at (0,0)
x = c(x,rep(0,1000))
y = c(y,rep(0,1000))
summary(lm(y~x))$r.squared
# 0.90

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to