If you have a large enough machine, you'll be able to run randomForest with that size data (we have done that regularly). One thing that many people don't seem to realize is that the "formula interface" has significant overhead. For large data sets, try running randomForest without using the formula. Other tips are: If you don't need to predict future data, set keep.forest to FALSE. Storing the forest takes lots of memory. If you already have the test set data, give it to randomForest along with the training data, instead of using predict() afterward. If you have a classification problem, try using the sampsize option to reduce the number of cases used to grow each tree.
As to the problem of having categorical predictors with more than 32 categories: Prof. Breiman's new version can deal with categorical predictors with (IMHO) obscene number of categories. However I have chosen to give that a very low priority for adding to the R package. The reason is that, IMHO, such variables need some massaging (collapsing/merging/whatever) before they will be somewhat meaningful in a model, anyway. (And personally I have no need for such feature.) HTH, Andy > From: PaTa PaTaS > > Thank you all for your help. The problem is not only with > reading the data (5000 cases times 2000 integer variables, > imported either from SPSS or TXT file) into my R 1.8.0 but > also with the procedure I would like to use = "randomForest" > from library "randomForest". It is not possible to run it > with such a data set (because of the insuficient memory > exception). Moreover, my data has factors with more than 32 > classes, which causes another error. > > Could you suggest any solution for my problem? Thank you a lot. > ____________________________________________________________ > Licitovat nejvyhodnejsi nab�dku je postavene na hlavu! Skoda > Octavia nyni se zvyhodnenim az 90.000 Kc! > http://ad2.seznam.cz/redir.cgi?instance=68740%26url=http://www .skoda-auto.cz/action/fast ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
