So...I wouldn't suggest the trying all possible logistic models approach either 
and I'm not sure exactly what your goals are in modeling.

However, I've been fiddling around with the variable importance (varimp) 
functions that come with the randomForest and party packages.  The idea is to 
get an idea of which independent variables are likely to be useful and then to 
focus on those variables (identified as being of high importance) with more 
attention than you could spend on the whole set.

A general advantage of the recursive partitioning approach is that it deals 
fairly nicely with interactions and collinearity.

Theoretically, the recursive partitioning approaches should be able to deal 
with missing values (often a problem with large datasets), but I have been 
unable to apply this with the variable importance functions.

Let me know if you require more details.  You can check out 
http://www.biomedcentral.com/1471-2105/9/307 for a couple examples of variable 
importance.


Jason Jones, PhD
Medical Informatics
[EMAIL PROTECTED]
801.707.6898


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Frank E Harrell 
Jr
Sent: Tuesday, September 30, 2008 2:54 PM
To: Milicic B. Marko
Cc: r-help@r-project.org
Subject: Re: [R] Logistic regression problem

Milicic B. Marko wrote:
> The only solution I can see is fitting all possib le 2 factor models enabling
> interactions and then assessing if interaction term is significant...
>
>
> any more ideas?

Please don't suggest such a thing unless you do simulations to back up
its predictive performance, type I error properties, and the impact of
collinearities.  You'll find this approach works as well as the U.S.
economy.

Frank Harrell


>
>
>
>
> Milicic B. Marko wrote:
>> I have a huge data set with thousands of variable and one binary
>> variable. I know that most of the variables are correlated and are not
>> good predictors... but...
>>
>> It is very hard to start modeling with such a huge dataset. What would
>> be your suggestion. How to make a first cut... how to eliminate most
>> of the variables but not to ignore potential interactions... for
>> example, maybe variable A is not good predictor and variable B is not
>> good predictor either, but maybe A and B together are good
>> predictor...
>>
>> Any suggestion is welcomed
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>


--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to