Thank you, Bert. I'll definitely ask there.
In the meantime I just wanted to ensure that my R code (my function for
bootstrap and the bootstrap run) is correct and my abnormal bootstrap
results are not a function of my erroneous code.
Thank you!



On Mon, Jan 27, 2014 at 7:09 PM, Bert Gunter <gunter.ber...@gene.com> wrote:

> I **think** this kind of methodological issue might be better at SO
> (stats.stackexchange.com).  It's not really about R programming, which
> is the main focus of this list. And yes, I know they do intersect.
> Nevertheless...
>
> Cheers,
> Bert
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
> (650) 467-7374
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
> H. Gilbert Welch
>
>
>
>
> On Mon, Jan 27, 2014 at 3:47 PM, Dimitri Liakhovitski
> <dimitri.liakhovit...@gmail.com> wrote:
> > Hello!
> > Below, I:
> > 1. Create a data set with a bunch of factors. All of them are predictors
> > and 'y' is the dependent variable.
> > 2. I run a classification Random Forests run with predictor importance. I
> > look at 2 measures of importance - MeanDecreaseAccuracy and
> MeanDecreaseGini
> > 3. I run 2 boostrap runs for 2 Random Forests measures of importance
> > mentioned above.
> >
> > Question: Could anyone please explain why I am getting such a huge
> positive
> > bias across the board (for all predictors) for MeanDecreaseAccuracy?
> >
> > Thanks a lot!
> > Dimitri
> >
> >
> > #----------------------------------------------------------------
> > # Creating a a data set:
> > #-------------------------------------------------------------
> >
> > N<-1000
> > myset1<-c(1,2,3,4,5)
> > probs1a<-c(.05,.10,.15,.40,.30)
> > probs1b<-c(.05,.15,.10,.30,.40)
> > probs1c<-c(.05,.05,.10,.15,.65)
> > myset2<-c(1,2,3,4,5,6,7)
> > probs2a<-c(.02,.03,.10,.15,.20,.30,.20)
> > probs2b<-c(.02,.03,.10,.15,.20,.20,.30)
> > probs2c<-c(.02,.03,.10,.10,.10,.25,.40)
> > myset.y<-c(1,2)
> > probs.y<-c(.65,.30)
> >
> > set.seed(1)
> > y<-as.factor(sample(myset.y,N,replace=TRUE,probs.y))
> > set.seed(2)
> > a<-as.factor(sample(myset1, N, replace = TRUE,probs1a))
> > set.seed(3)
> > b<-as.factor(sample(myset1, N, replace = TRUE,probs1b))
> > set.seed(4)
> > c<-as.factor(sample(myset1, N, replace = TRUE,probs1c))
> > set.seed(5)
> > d<-as.factor(sample(myset2, N, replace = TRUE,probs2a))
> > set.seed(6)
> > e<-as.factor(sample(myset2, N, replace = TRUE,probs2b))
> > set.seed(7)
> > f<-as.factor(sample(myset2, N, replace = TRUE,probs2c))
> >
> > mydata<-data.frame(a,b,c,d,e,f,y)
> >
> >
> > #-------------------------------------------------------------
> > # Single Random Forests run with predictor importance.
> > #-------------------------------------------------------------
> >
> > library(randomForest)
> > set.seed(123)
> > rf1<-randomForest(y~.,data=mydata,importance=T)
> > importance(rf1)[,c(3:4)]
> >
> > #-------------------------------------------------------------
> > # Bootstrapping run
> > #-------------------------------------------------------------
> >
> > library(boot)
> >
> > ### Defining two functions to be used for bootstrapping:
> >
> > # myrf3 returns MeanDecreaseAccuracy:
> > myrf3<-function(usedata,idx){
> >   set.seed(123)
> >   out<-randomForest(y~.,data=usedata[idx,],importance=T)
> >   return(importance(out)[,3])
> > }
> >
> > # myrf4 returns MeanDecreaseGini:
> > myrf4<-function(usedata,idx){
> >   set.seed(123)
> >   out<-randomForest(y~.,data=usedata[idx,],importance=T)
> >   return(importance(out)[,4])
> > }
> >
> > ### 2 bootstrap runs:
> > rfboot3<-boot(mydata,myrf3,R=10)
> > rfboot4<-boot(mydata,myrf4,R=10)
> >
> > ### Results
> > rfboot3   # for MeanDecreaseAccuracy
> > colMeans(rfboot3$t)-importance(rf1)[,3]
> >
> > rfboot4   # for MeanDecreaseGini
> > colMeans(rfboot4$t)-importance(rf1)[,4]   # for MeanDecreaseGini
> >
> > --
> > Dimitri Liakhovitski
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>



-- 
Dimitri Liakhovitski

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to