Re: [R] Random Forest Cross Validation

2011-02-27 Thread ronzhao
Thanks to you all! Now I got it! -- View this message in context: http://r.789695.n4.nabble.com/Random-Forest-Cross-Validation-tp3314777p3327384.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list

Re: [R] Random Forest Cross Validation

2011-02-24 Thread Liaw, Andy
steps such as feature selections, all bets are off. Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of mxkuhn Sent: Tuesday, February 22, 2011 7:17 PM To: ronzhao Cc: r-help@r-project.org Subject: Re: [R] Random Forest Cross

Re: [R] Random Forest Cross Validation

2011-02-22 Thread ronzhao
Thanks, Max. Yes, I did some feature selections in the training set. Basically, I selected the top 1000 SNPs based on OOB error and grow the forest using training set, then using the test set to validate the forest grown. But if I do the same thing in test set, the top SNPs would be different

Re: [R] Random Forest Cross Validation

2011-02-22 Thread mxkuhn
If you want to get honest estimates of accuracy, you should repeat the feature selection within the resampling (not the test set). You will get different lists each time, but that's the point. Right now you are not capturing that uncertainty which is why the oob and test set results differ so

Re: [R] Random Forest Cross Validation

2011-02-20 Thread Max Kuhn
I am using randomForest package to do some prediction job on GWAS data. I firstly split the data into training and testing set (70% vs 30%), then using training set to grow the trees (ntree=10). It looks that the OOB error in training set is good (10%). However, it is not very good for the

[R] Random Forest Cross Validation

2011-02-19 Thread ronzhao
Hi, I am using randomForest package to do some prediction job on GWAS data. I firstly split the data into training and testing set (70% vs 30%), then using training set to grow the trees (ntree=10). It looks that the OOB error in training set is good (10%). However, it is not very good for