I'm using R (windows) version 2.1.1, randomForest version 4.15. I call randomForest like this:
my.rf=randomForest(x=train.df[,-response_index], y=train.df[,response_index], xtest=test.df[,-response_index], ytest=test.df[,response_index], importance=TRUE,proximity=FALSE, keep.forest=TRUE) (where train.df and test.df are my train and test data.frames and response_index is the column number specifiying the class) I then save each tree to a file so I can combine them all afterwards. There are no memory issues when keep.forest=FALSE. But I think that's the bit I need for future predictions (right?). I did check previous messages on memory issues, and thought that combining the trees afterwards would solve the problem. Since my cross-validation subsets give me a fairly stable error-rate, I suppose I could just use a randomForest trained on just a subset of my data. But would I not be "wasting" data this way? A bit off the subject, but should the order at which at rows (ie. sets of explanatory variables) are passed to the randomForest function affect the result? I have noticed that if I pick a random unordered sample from my control data for training the error rate is much lower than if I a take an ordered sample. This remains true for all my cross-validation results. I'm sorry for my many questions. Many Thanks Eleni Rapsomaniki ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
