running anova() on intact12 and intact 21 gives two different results!! > anova(intact12) Analysis of Variance Table
Response: y Df Sum Sq Mean Sq F value Pr(>F) x1 1 663.18 663.18 203.065 < 2.2e-16 *** x2 1 35.21 35.21 10.781 0.001940 ** Residuals 47 153.49 3.27 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > anova(intact21) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x2 1 698.26 698.26 213.8077 <2e-16 *** x1 1 0.12 0.12 0.0379 0.8466 Residuals 47 153.49 3.27 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 On Sun, Dec 14, 2008 at 8:56 PM, Tanmoy Talukdar <tanmoy.taluk...@gmail.com> wrote: > Why do you think that running lm() twice on those two models is going > to help me? They are identical models and hence we get identical > results.The second question is now alright. I had some > misunderstanding about it. > > Please tell me if you can find any "downside " in summary (). I can't find > any. > > > i 've edited the code for that replication issue. > > set.seed(127) > n <- 50 > x1 <- runif(n,1,10) > x2 <- x1 + rnorm(n,0,0.5) > plot(x1,x2) # x1 and x2 strongly correlated > cor(x1,x2) > y <- 3 + 0.5*x1 + 1.1*x2 + rnorm(n,0,2) > intact.lm <- lm(y ~ x1 + x2) > summary(intact.lm) > anova(intact.lm) > > >> summary(intact.lm) > > Call: > lm(formula = y ~ x1 + x2) > > Residuals: > Min 1Q Median 3Q Max > -3.4578 -1.1326 0.4551 1.2807 4.8241 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 3.63603 0.61944 5.870 4.23e-07 *** > x1 -0.09555 0.49114 -0.195 0.84658 > x2 1.59384 0.48542 3.283 0.00194 ** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 1.807 on 47 degrees of freedom > Multiple R-squared: 0.8198, Adjusted R-squared: 0.8121 > F-statistic: 106.9 on 2 and 47 DF, p-value: < 2.2e-16 > >> anova(intact.lm) > Analysis of Variance Table > > Response: y > Df Sum Sq Mean Sq F value Pr(>F) > x1 1 663.18 663.18 203.065 < 2.2e-16 *** > x2 1 35.21 35.21 10.781 0.001940 ** > Residuals 47 153.49 3.27 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > On Sun, Dec 14, 2008 at 8:26 PM, David Winsemius <dwinsem...@comcast.net> > wrote: >> >> On Dec 14, 2008, at 9:40 AM, Tanmoy Talukdar wrote: >> >>> [sorry for the repost. I forgot to switch off formatting last time] >>> >>> I have two assignment problems... >>> >>> I have written this small code for regression with two regressors . >>> >> For replication purposes, it might be good to set a seed for the random >> number generation. >> >> set.seed(127) >>> >>> n <- 50 >>> x1 <- runif(n,1,10) >>> x2 <- x1 + rnorm(n,0,0.5) >>> plot(x1,x2) # x1 and x2 strongly correlated >>> cor(x1,x2) >>> y <- 3 + 0.5*x1 + 1.1*x2 + rnorm(n,0,2) >>> intact.lm <- lm(y ~ x1 + x2) >>> summary(intact.lm) >>> anova(intact.lm) >>> >> You should also run anova on these models: >> >> intact21 <- lm(y~x2+x1) >> intact12 <- lm(y~x1+x2) >> >>> >>> the questions are >>> >>> 1.The function summary() is convenient since the result does not >>> depend on the order the variables >>> are listed in the linear model definition. It has a serious downside >>> though which is obvious in this case. >>> Are there any signficant variables left? >>> >>> 2. An anova(intact.lm) table shows how much the second variable >>> contributes to the result in >>> addition to the first. Is there a variable significant now?Is the >>> second variable significant? >> >> Both anova and summary were in agreement that the P-value for addition of x2 >> ito a >> model that already 1ncluded x1 is 0.0296. One of them uses the t statistic >> and the >> other used the F statistic. I am not sure where your confusion lies. >> >> -- >> David Winsemius >> >>> >>> >>> the results i got: >>> >>>> summary(intact.lm) >>> >>> Call: >>> lm(formula = y ~ x1 + x2) >>> >>> Residuals: >>> Min 1Q Median 3Q Max >>> -5.5824 -1.5314 -0.1568 1.4425 5.3374 >>> >>> Coefficients: >>> Estimate Std. Error t value Pr(>|t|) >>> (Intercept) 3.4857 0.9354 3.726 0.000521 *** >>> x1 0.2537 0.6117 0.415 0.680191 >>> x2 1.3517 0.6025 2.244 0.029608 * >>> --- >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >>> >>> Residual standard error: 2.34 on 47 degrees of freedom >>> Multiple R-squared: 0.7483, Adjusted R-squared: 0.7376 >>> F-statistic: 69.87 on 2 and 47 DF, p-value: 8.315e-15 >>> >>>> anova(intact.lm) >>> >>> Analysis of Variance Table >>> >>> Response: y >>> Df Sum Sq Mean Sq F value Pr(>F) >>> x1 1 737.86 737.86 134.7129 2.11e-15 *** >>> x2 1 27.57 27.57 5.0338 0.02961 * >>> Residuals 47 257.43 5.48 >>> --- >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >>> >>> >>> >>> my question is that , i cant see any "serious downside" in using >>> summary (). And in the second question I am totally clueless. I need >>> your help >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.