You may want to look at a book that was published more recently than 17 years ago (computing has changed a lot since then). Doing stepwise regression using p-values is one approach (and when p-values were the easiest (only) thing to compute, it was reasonable to use them). But think about how many p-values you would be computing and comparing to 0.05 in a stepwise regression, now think about how many you would have computed if your data had come from a different sample, what is your type I error rate? Is the usual p-value theory even meaningful in this situation?
There are several criteria that can be used in stepwise regression to decide which term to add/drop, p-value (or F-statistic) is only 1, others include AIC, BIC, Adjusted R-squared, PRESS, gut feeling, prior knowledge, cost, ... Some of these have properties better than p-values, but most still suffer from the fact that a small change in the data can result in a very different model. Look at the lars, lasso2, and BMA packages for some more modern alternatives to stepwise regression. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Thursday, December 14, 2006 9:28 AM To: r-help@stat.math.ethz.ch Subject: [R] Stepwise regression Dear all, I am wondering why the step() procedure in R has the description 'Select a formula-based model by AIC'. I have been using Stata and SPSS and neither package made any reference to AIC in its stepwise procedure, and I read from an earlier R-Help post that step() is really the 'usual' way for doing stepwise (R Help post from Prof Ripley, Fri, 2 Apr 1999 05:06:03 +0100 (BST)). My understanding of the 'usual' way of doing say forward regression is that variables whose p value drops below a criterion (commonly 0.05) become candidates for being included in the model, and the one with the lowest p among these gets chosen, and the step is repeated until all p values not in the model are above 0.05, cf Hosmer and Lemeshow (1989) Applied Logistic Regression. The procedure does not require examination of the AIC. I am not well aquainted with R enough to understand the codes used in step(), so can somebody tell me how step() works? Thanks very much, Tim ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.