Hi all, I'd like to use random forest regression to say something about the importance of a set of genes (binary) for schizophrenia-related behavior (continuous measure). I am still reading up on this technique, but would already really appreciate any feedback on whether my approach is valid. So...using the randomForest package, is it a good approach to enter a few dozen binary predictors to assess their importance (as a set, and individually) for a continuous measure with a sample size of ~1000 people? More specific questions: - I have an additional interest in interactions (though perhaps not the best word in this context), does it make any sense to say something about the influence one predictor has over others by looking at the change in estimated importance of the others when that predictor is removed from the model? - I have a few siblings in the data, i.e. non-independence, is this a problem and if so, is there anything I can do about it? - The few papers I have seen so far on using this technique in a similar situation do not include any 'standard' covariates such as age and gender, should I? Any and all feedback is greatly appreciated!! Kind regards, Johannes
p.s. Hope I've come to the right place despite this being a more general question, if not please let me know of a forum where this is more suited for. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.