I did mention dropterm (and drop1 can also be used). This is more efficient, simpler and less-error prone than setting up your own anova()'s (especially if you want to do LRTs for dropping several terms).
On Tue, 14 Nov 2006, Inman, Brant A. M.D. wrote: > > Thank you to Prof Ripley and Henric Nilsson for their observation that I > was using "anova" inappropriately for the question that I was trying to > answer. Note that the Wald statistics and confidence interval were > calculable in the previous email but that I prefered using the more > accurate deviance statistics. I will demonstrate my error for the > benefit of those new users of R (like me) that may not have appreciated > how the "anova" function is working SEQUENTIALLY, and what SEQUENTIALLY > actually means in this context. > > Since the "anova" function is a sequential test of the current model, > only the statistic for the last term in the model formula is a true > deviance chi-square statistic for (full model) .vs. (full model - term). > For instance, using the data upon which this question was based, > consider the following 2 models: > > ---------------------- > >> fit0 <- glm(y ~ 1, data=d, family=binomial(link=logit), > na.action=na.omit) >> fit1 <- update(fit0, . ~ . + x1 + x2 + x3) > > ---------------------- > > Here, fit0 is the null (intercept-only) model and fit1 is the full model > (without interactions because interactions are not biologically > plausible in this medical dataset). Now notice the result of the "anova" > function for the full model: > > ---------------------- > >> anova(fit1, test='Chisq') > > ... > > Df Deviance Resid. Df Resid. Dev P(>|Chi|) > NULL 99 137.989 > x1 1 8.267 98 129.721 0.004 > x2 1 5.639 97 124.083 0.018 > x3 1 3.433 96 120.650 0.064 > > ----------------------- > > It is incorrect to interpret the deviance chi-square test presented > above for x1 (P=0.004) as the deviance chi-square statistic comparing > (y~x1+x2+x3) .vs. (y~x2+x3) as the statistic calculated is for (y~1) > .vs. (y~x1). Similarly, the deviance chi-square statistic calculated for > x2 (P=0.018) is NOT for (y~x1+x2+x3) .vs. (y~x1+x3) but for (y~x1) .vs. > (y~x1+x2). Lastly, the deviance chi-square statistic for x3(P=0.064) is > probably the most intuitive because it is for the comparison of > (y~x1+x2+x3) .vs. (y~x1+x2), the result we would typically want to > present for x3 in the full model. So how do you get the correct > deviance chi-square statistics for x1 and x2 in the full model? There > are a couple of ways of arriving at the same answer as I will > demonstrate for the case of x1. > > Option#1: Reorder the full model so that x1 is the last term in the > model formula > > ----------------------- > >> fit2 <- glm(y ~ x2 + x3 + x1, data=d, family=binomial(link=logit), > na.action=na.omit) >> anova(fit2, test='Chisq') > ... > > Df Deviance Resid. Df Resid. Dev P(>|Chi|) > NULL 99 137.989 > x2 1 7.305 98 130.683 0.007 > x3 1 3.821 97 126.862 0.051 > x1 1 6.212 96 120.650 0.013 > > ----------------------- > > Notice that the deviance chi-square statistics for all of the variables > have changed, despite fit2 being identical in content to fit1. Just the > order of the terms in the model formula have changed from fit1 to fit2. > The deviance statistic for x1 is now the correct one for the full model, > that is for the comparison (y~x1+x2+x3) .vs. (y~x2+x3). > > Option#2: Compare the full model to a reduced model that does not > include x1. > > ----------------------- > >> fit3 <- update(fit1, . ~ . - x1) >> anova(fit1, fit3, test='Chisq') > ... > > Model 1: y ~ x1 + x2 + x3 > Model 2: y ~ x2 + x3 > Resid. Df Resid. Dev Df Deviance P(>|Chi|) > 1 96 120.650 > 2 97 126.862 -1 -6.212 0.013 > > ----------------------- > > Fit3 is the same model as fit1 except that it is missing the x1 term. > Therefore, any change in the deviance chi-square statistic is due to the > deletion of x1 from full model. Notice that the difference in residual > deviances between fit3 and fit1 (126.862 - 120.650 = 6.212) is the same > the difference b/t x1 and x3 in option#1. > > > Brant > > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
