No, that's not the problem. You still need to take Bill's and Torsten's advise. If the categorical variables (class labels included) are read into R as factors, then the conversion to integers is automagic: Factors in R _are_ integers 1 through the number of levels, with the levels attribute. randomForest() just takes advantage of that fact.
Andy > From: David L. Van Brunt, Ph.D. > > D'OH! > > I clearly just needed to Re-RTFM!!! I had a column still > coded as TEXT > (yup, "Monday", etc), and the randomForest manual by Breiman > says they need > to be numerically coded. Easy recode. I'll try running it > RIGHT this time, > and let you all know how this goes. Grumble mumble mumble.... > > On 4/5/04 1:40, "[EMAIL PROTECTED]" > <[EMAIL PROTECTED]> wrote: > > > Alternatively, if you can arrive at a sensible ordering of > the levels > > you can declare them ordered factors and make the > computation feasible > > once again. > > > > Bill Venables. > > > > -----Original Message----- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf Of > Torsten Hothorn > > Sent: Monday, 5 April 2004 4:27 PM > > To: David L. Van Brunt, Ph.D. > > Cc: R-Help > > Subject: Re: [R] Can't seem to finish a randomForest.... > Just goes and > > goes! > > > > > > On Sun, 4 Apr 2004, David L. Van Brunt, Ph.D. wrote: > > > >> Playing with randomForest, samples run fine. But on real > data, no go. > >> > >> Here's the setup: OS X, same behavior whether I'm using > R-Aqua 1.8.1 > >> or the Fink compile-of-my-own with X-11, R version 1.8.1. > >> > >> This is on OS X 10.3 (aka "Panther"), G4 800Mhz with 512M physical > >> RAM. > >> > >> I have not altered the Startup options of R. > >> > >> Data set is read in from a text file with "read.table", and has 46 > >> variables and 1,855 cases. Trying the following: > >> > >> The DV is categorical, 0 or 1. Most of the IV's are either > continuous, > > > >> or correctly read in as factors. The largest factor has 30 > levels.... > >> Only the > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > This means: there are 2^(30-1) = 536.870.912 possible splits to be > > evaluated everytime this variable is picked up (minus > something due to > > empty levels). At least the last time I looked at the code, > randomForest > > used an exhaustive search over all possible splits. Try reducing the > > number of levels to something reasonable (or for a first > shot: remove > > this variable from the learning sample). > > > > Best, > > > > Torsten > > > > > >> DV seems to need identifying as a factor to force class trees over > >> regresssion: > >> > >>> Mydata$V46<-as.factor(Mydata$V46) > >>> > Myforest.rf<-randomForest(V46~.,data=Mydata,ntrees=100,mtry=7,proximi > >>> ties=FALSE > >> , importance=FALSE) > >> > >> 5 hours later, R.bin was still taking up 75% of my processor. When > >> I've tried this with larger data, I get errors referring > to the buffer > > > >> (sorry, not in front of me right now). > >> > >> Any ideas on this? The data don't seem horrifically large. > Seems like > >> there are a few options for setting memory size, but I'm not sure > >> which of them to try tweaking, or if that's even the issue. > >> > >> ______________________________________________ > >> [EMAIL PROTECTED] mailing list > >> https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide! > >> http://www.R-project.org/posting-guide.html > >> > >> > > > > ______________________________________________ > > [EMAIL PROTECTED] mailing list > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > -- > David L. Van Brunt, Ph.D. > Outlier Consulting & Development > mailto: <[EMAIL PROTECTED]> > > ______________________________________________ > [EMAIL PROTECTED] mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
