Exactly as Max said.  See the rfcv() function in the latest version of 
randomForest, as well as the reference in the help page for that function.

OOB estimate is as accurate as CV estimate _if_ you run straight RF.  Most 
other methods do not have this "feature".  However, if you start adding steps 
such as feature selections, all bets are off.

Andy 

> -----Original Message-----
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of mxkuhn
> Sent: Tuesday, February 22, 2011 7:17 PM
> To: ronzhao
> Cc: r-help@r-project.org
> Subject: Re: [R] Random Forest & Cross Validation
> 
> If you want to get honest estimates of accuracy, you should 
> repeat the feature selection within the resampling (not the 
> test set). You will get different lists each time, but that's 
> the point. Right now you are not capturing that uncertainty 
> which is why the oob and test set results differ so much.
> 
> The list you get int the original training set is still the 
> real list. The resampling results help you understand how 
> much you might be overfitting the *variables*.
> 
> Max
> 
> On Feb 22, 2011, at 4:39 PM, ronzhao <yzhaoh...@gmail.com> wrote:
> 
> > 
> > Thanks, Max.
> > 
> > Yes, I did some feature selections in the training set. Basically, I
> > selected the top 1000 SNPs based on OOB error and grow the 
> forest using
> > training set, then using the test set to validate the forest grown.
> > 
> > But if I do the same thing in test set, the top SNPs would 
> be different than
> > those in training set. That may be difficult to interpret.
> > 
> > 
> > 
> > 
> > -- 
> > View this message in context: 
> http://r.789695.n4.nabble.com/Random-Forest-Cross-Validation-t
p3314777p3320094.html
> > Sent from the R help mailing list archive at Nabble.com.
> > 
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to