I'm using R (windows) version 2.1.1, randomForest version 4.15. 
I call randomForest like this:

my.rf=randomForest(x=train.df[,-response_index], y=train.df[,response_index],
 xtest=test.df[,-response_index], ytest=test.df[,response_index],
 importance=TRUE,proximity=FALSE, keep.forest=TRUE)

 (where train.df and test.df are my train and test data.frames and
 response_index is the column number specifiying the class)

I then save each tree to a file so I can combine them all afterwards. There are
no memory issues when keep.forest=FALSE. But I think that's the bit I need for
future predictions (right?). 

I did check previous messages on memory issues, and thought that
combining the trees afterwards would solve the problem. Since my
cross-validation subsets give me a fairly stable error-rate, I suppose I could
just use a randomForest trained on just a subset of my data. But would I not be
"wasting" data this way?

A bit off the subject, but should the order at which at rows (ie. sets of
explanatory variables) are passed to the randomForest function affect the
result? I have noticed that if I pick a random unordered sample from my control
data for training the error rate is much lower than if I a take an ordered
sample. This remains true for all my cross-validation results. 

I'm sorry for my many questions.
Many Thanks
Eleni Rapsomaniki

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to