Thank you, Bert. I'll definitely ask there. In the meantime I just wanted to ensure that my R code (my function for bootstrap and the bootstrap run) is correct and my abnormal bootstrap results are not a function of my erroneous code. Thank you!
On Mon, Jan 27, 2014 at 7:09 PM, Bert Gunter <gunter.ber...@gene.com> wrote: > I **think** this kind of methodological issue might be better at SO > (stats.stackexchange.com). It's not really about R programming, which > is the main focus of this list. And yes, I know they do intersect. > Nevertheless... > > Cheers, > Bert > > Bert Gunter > Genentech Nonclinical Biostatistics > (650) 467-7374 > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > H. Gilbert Welch > > > > > On Mon, Jan 27, 2014 at 3:47 PM, Dimitri Liakhovitski > <dimitri.liakhovit...@gmail.com> wrote: > > Hello! > > Below, I: > > 1. Create a data set with a bunch of factors. All of them are predictors > > and 'y' is the dependent variable. > > 2. I run a classification Random Forests run with predictor importance. I > > look at 2 measures of importance - MeanDecreaseAccuracy and > MeanDecreaseGini > > 3. I run 2 boostrap runs for 2 Random Forests measures of importance > > mentioned above. > > > > Question: Could anyone please explain why I am getting such a huge > positive > > bias across the board (for all predictors) for MeanDecreaseAccuracy? > > > > Thanks a lot! > > Dimitri > > > > > > #---------------------------------------------------------------- > > # Creating a a data set: > > #------------------------------------------------------------- > > > > N<-1000 > > myset1<-c(1,2,3,4,5) > > probs1a<-c(.05,.10,.15,.40,.30) > > probs1b<-c(.05,.15,.10,.30,.40) > > probs1c<-c(.05,.05,.10,.15,.65) > > myset2<-c(1,2,3,4,5,6,7) > > probs2a<-c(.02,.03,.10,.15,.20,.30,.20) > > probs2b<-c(.02,.03,.10,.15,.20,.20,.30) > > probs2c<-c(.02,.03,.10,.10,.10,.25,.40) > > myset.y<-c(1,2) > > probs.y<-c(.65,.30) > > > > set.seed(1) > > y<-as.factor(sample(myset.y,N,replace=TRUE,probs.y)) > > set.seed(2) > > a<-as.factor(sample(myset1, N, replace = TRUE,probs1a)) > > set.seed(3) > > b<-as.factor(sample(myset1, N, replace = TRUE,probs1b)) > > set.seed(4) > > c<-as.factor(sample(myset1, N, replace = TRUE,probs1c)) > > set.seed(5) > > d<-as.factor(sample(myset2, N, replace = TRUE,probs2a)) > > set.seed(6) > > e<-as.factor(sample(myset2, N, replace = TRUE,probs2b)) > > set.seed(7) > > f<-as.factor(sample(myset2, N, replace = TRUE,probs2c)) > > > > mydata<-data.frame(a,b,c,d,e,f,y) > > > > > > #------------------------------------------------------------- > > # Single Random Forests run with predictor importance. > > #------------------------------------------------------------- > > > > library(randomForest) > > set.seed(123) > > rf1<-randomForest(y~.,data=mydata,importance=T) > > importance(rf1)[,c(3:4)] > > > > #------------------------------------------------------------- > > # Bootstrapping run > > #------------------------------------------------------------- > > > > library(boot) > > > > ### Defining two functions to be used for bootstrapping: > > > > # myrf3 returns MeanDecreaseAccuracy: > > myrf3<-function(usedata,idx){ > > set.seed(123) > > out<-randomForest(y~.,data=usedata[idx,],importance=T) > > return(importance(out)[,3]) > > } > > > > # myrf4 returns MeanDecreaseGini: > > myrf4<-function(usedata,idx){ > > set.seed(123) > > out<-randomForest(y~.,data=usedata[idx,],importance=T) > > return(importance(out)[,4]) > > } > > > > ### 2 bootstrap runs: > > rfboot3<-boot(mydata,myrf3,R=10) > > rfboot4<-boot(mydata,myrf4,R=10) > > > > ### Results > > rfboot3 # for MeanDecreaseAccuracy > > colMeans(rfboot3$t)-importance(rf1)[,3] > > > > rfboot4 # for MeanDecreaseGini > > colMeans(rfboot4$t)-importance(rf1)[,4] # for MeanDecreaseGini > > > > -- > > Dimitri Liakhovitski > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > -- Dimitri Liakhovitski [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.