In the trivial case where all candidate predictors have one degree of freedom (which is unlikely as some things will be nonlinear or have > 2 categories), adding a variable if it increases AIC is the same as adding it if its chi-square exceeds 2. This corresponds to an alpha level of 0.157 for a chi-square with 1 d.f. At least AIC leads people to use a more realistic alpha (small alpha in stepwise regression leads to more bias in the retained regression coefficients). But you still have serious multiplicity problems, and non-replicable models.

Things are different if you have a pre-defined group of variables you are thinking of adding. Suppose that this group of 10 variables required 15 d.f. Adding the group if AIC (based on 15 d.f.) increases wouldn't be a bad strategy. This avoids the multiplicities of single-variable "looks".

Frank

Frank E Harrell Jr   Professor and Chairman        School of Medicine
                     Department of Biostatistics   Vanderbilt University

On Mon, 9 Aug 2010, Kingsford Jones wrote:

On Mon, Aug 9, 2010 at 10:27 AM, Frank Harrell <f.harr...@vanderbilt.edu> wrote:

Note that stepwise variale selection based on AIC has all the problems of
stepwise variable selection based on P-values.  AIC is just a restatement of
the P-Value.

I find the above statement very interesting, particularly because
there are common misconceptions in the ecological community that AIC
is a panacea for model selection problems and the theory behind
P-values is deeply flawed.  Can you direct me toward a reference for
better understanding the relation?

best,

Kingsford Jones



Frank

Frank E Harrell Jr   Professor and Chairman        School of Medicine
                    Department of Biostatistics   Vanderbilt University

On Mon, 9 Aug 2010, Gabor Grothendieck wrote:

On Mon, Aug 9, 2010 at 6:43 AM, Harsh <singhal...@gmail.com> wrote:

Hello useRs,

I have a problem at hand which I'd think is fairly common amongst
groups were R is being adopted for Analytics in place of SAS.
Users would like to obtain results for logistic regression in R that
they have become accustomed to in SAS.

Towards this end, I was able to propose the Design package in R which
contains many functions to extract the various metrics that SAS
reports.

If you have suggestions pertaining to other packages, or sample code
that replicates some of the SAS outputs for logistic regression, I
would be glad to hear of them.

Some of the requirements are:
- Stepwise variable selection for logistic regression
- Choose base level for factor variables
- The Hosmer-Lemeshow statistic
- concordant and discordant
- Tau C statistic


For stepwise logistic regression using AIC see:

library(MASS)
?stepAIC

For specifying reference level:

?relevel

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to