I need to analyze the results of ad hoc experiments run in manufacturing with crazy confounding and possible supersaturation (i.e., more potentially explanatory variables than runs), when each run is very expensive in both time and money. There have to be ways to summarize concisely and intelligently what the data can tell us and what remains uncertain, including the level of partial confounding between alternative explanations. I think I've gotten reasonable results with my own modification of Venables & Ripley's stepAIC to compute an approximate posterior over tested models using the AICc criterion described, e.g., by Burnham and Anderson (2002) Model Selection and Multi-Model Inference (Springer). Preliminary simulations showed that when I used the naive prior (that all models are equally likely, including the null model), the null model is usually rejected when true. What a surprise! I think I can fix that using a more intelligent prior. I also think I can evaluate the partial confounding between alternative models by studying the correlation matrix between the predictions of alternative models.
Comments?
Thanks,
Spencer GravesFrank E Harrell Jr wrote:
Smit, Robin wrote:
I am hoping to get some advise on the following:
I am looking for an automatic variable selection procedure to reduce the
number of potential predictor variables (~ 50) in a multiple regression
model.
I would be interested to use the forward stepwise regression using the
partial F test. I have looked into possible R-functions but could not find this
particular approach. There is a function (stepAIC) that uses the Akaike criterion or Mallow's
Cp criterion. In addition, the drop1 and add1 functions came closest to what I want
but with them I cannot perform the required procedure. Do you have any ideas? Kind regards,
Robin Smit
--------------------------------------------
Business Unit TNO Automotive
Environmental Studies & Testing
PO Box 6033, 2600 JA Delft
THE NETHERLANDS
Robin,
If you are looking for a method that does not offer the best predictive accuracy and that violates every aspect of statistical inference, you are on the right track. See http://www.stata.com/support/faqs/stat/stepwise.html for details.
______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
