That's the advantage of having an R interface to RF: you can do such
automation rather easily. E.g.,
twoStageRF <- function(x, y, nVar=round(0.1*ncol(x)), ...) {
imp <- randomForest(x, y, importance=TRUE, ...)$importance[,3]
cutoff <- sort(imp, decreasing=TRUE)[nVar]
randomForest(x[, imp >= cutoff], y, ...)
}
[Disclaimer: I just wrote the function on the spot, so completely untested.
This is just to demonstrate how simple it would be. You can embelish it as
much as you'd like.]
I have written a function that uses CV to choose the `optimal' number of
variables to keep (rather than blindly select one up front). I might toss
it in the next version of the package...
HTH,
Andy
> From: Hui Han
>
> Hi,
>
> I am using the Random Forest Package, and want to do an
> automatic rerun
> using only those variables that were most important in the
> original run.
> Is there anybody who has experience with this and can give me helpful
> suggestions?
>
> Best regards,
>
> Hui Han
> Department of Computer Science and Engineering,
> The Pennsylvania State University
> University Park, PA,16802
> email: [EMAIL PROTECTED]
> homepage: http://www.cse.psu.edu/~hhan
>
> ______________________________________________
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html