Re: [R] two methods for regression, two different results

Jari Oksanen Tue, 05 Apr 2005 23:58:18 -0700

On Tue, 2005-04-05 at 22:54 -0400, John Sorkin wrote:
> Please forgive a straight stats question, and the informal notation.
>  
> let us say we wish to perform a liner regression:
> y=b0 + b1*x + b2*z
>  
> There are two ways this can be done, the usual way, as a single
> regression, 
> fit1<-lm(y~x+z)
> or by doing two regressions. In the first regression we could have y as
> the dependent variable and x as the independent variable 
> fit2<-lm(y~x). 
> The second regrssion would be a regression in which the residuals from
> the first regression would be the depdendent variable, and the
> independent variable would be z.
> fit2<-lm(fit2$residuals~z)
>  
> I would think the two methods would give the same p value and the same
> beta coefficient for z. The don't. Can someone help my understand why
> the two methods do not give the same results. Additionally, could
> someone tell me when one method might be better than the other, i.e.
> what question does the first method anwser, and what question does the
> second method answer. I have searched a number of textbooks and have not
> found this question addressed.
>  
John,


Bill Venables already told you that they don't do that, because they are
not orthogonal. Here is a simpler way of getting the same result as he
suggested for the coefficients of z (but only for z):

> x <- runif(100)
> z <- x + rnorm(100, sd=0.4)
> y <- 3 + x + z + rnorm(100, sd=0.3)
> mod <- lm(y ~ x + z)
> mod2 <- lm(residuals(lm(y ~ x)) ~ x + z)
> summary(mod)

Call:
lm(formula = y ~ x + z)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  2.96436    0.06070  48.836  < 2e-16 ***
x            0.96272    0.11576   8.317 5.67e-13 ***
z            1.08922    0.06711  16.229  < 2e-16 ***
---
Residual standard error: 0.2978 on 97 degrees of freedom

> summary(mod2)

Call:
lm(formula = residuals(lm(y ~ x)) ~ x + z)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.15731    0.06070  -2.592   0.0110 *
x           -0.84459    0.11576  -7.296 8.13e-11 ***
z            1.08922    0.06711  16.229  < 2e-16 ***
---
Residual standard error: 0.2978 on 97 degrees of freedom

You can omit x from the outer lm only if x and z are orthogonal,
although you already removed the effect of x... In orthogonal case the
coefficient for x would be 0.

Residuals are equal in these two models:

> range(residuals(mod) - residuals(mod2))
[1] -2.797242e-17  5.551115e-17

But, of course, fitted values are not equal, since you fit the mod2 to
the residuals after removing the effect of x...

cheers, jari oksanen
-- 
Jari Oksanen <[EMAIL PROTECTED]>

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] two methods for regression, two different results

Reply via email to