Hi there: I have a question on random foresst: recently i helped a friend with her random forest and i came with this problem: her dataset has 6 classes and since the sample size is pretty small: 264 and the class distr is like this (Diag is the response variable) sample.size <- lapply(1:6, function(i) sum(Diag==i)) > sample.size [[1]] [1] 36
[[2]] [1] 12 [[3]] [1] 120 [[4]] [1] 36 [[5]] [1] 30 [[6]] [1] 30 I assigned this sample.size to sampsz for a stratiefied sampling purpose and i got the following error: Error in sum(..., na.rm = na.rm) : invalid 'mode' of argument if I use sampsz=c(36, 12, 120, 36, 30, 30), then it is fine. Could you tell me why? btw, as to classification problem for this with uneven class number situation, do u have some suggestions to improve its accuracy? I tried to use c() way to make the sampsz works but the result is similar. Thanks, weiwei On 6/30/05, Liaw, Andy <[EMAIL PROTECTED]> wrote: > The limitation comes from the way categorical splits are represented in the > code: For a categorical variable with k categories, the split is > represented by k binary digits: 0=right, 1=left. So it takes k bits to > store each split on k categories. To save storage, this is `packed' into a > 4-byte integer (32-bit), thus the limit of 32 categories. > > The current Fortran code (version 5.x) by Breiman and Cutler gets around > this limitation by storing the split in an integer array. While this lifts > the 32-category limit, it takes much more memory to store the splits. I'm > still trying to figure out a more memory efficient way of storing the splits > without imposing the 32-category limit. If anyone has suggestions, I'm all > ears. > > Best, > Andy > > > From: [EMAIL PROTECTED] > > > > Hello, > > > > I'm using the random forest package. One of my factors in the > > data set contains 41 levels (I can't code this as a numeric > > value - in terms of linear models this would be a random > > factor). The randomForest call comes back with an error > > telling me that the limit is 32 categories. > > > > Is there any reason for this particular limit? Maybe it's > > possible to recompile the module with a different cutoff? > > > > thanks a lot for your help, > > kind regards, > > > > > > Arne > > > > ______________________________________________ > > [email protected] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > -- Weiwei Shi, Ph.D "Did you always know?" "No, I did not. But I believed..." ---Matrix III ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
