it works. thanks, but: (just curious) why i tried previously and i got
> is.vector(sample.size) [1] TRUE i also tried as.vector(sample.size) and assigned it to sampsz,it still does not work. On 7/7/05, Duncan Murdoch <[EMAIL PROTECTED]> wrote: > On 7/7/2005 3:38 PM, Weiwei Shi wrote: > > Hi there: > > I have a question on random foresst: > > > > recently i helped a friend with her random forest and i came with this > > problem: > > her dataset has 6 classes and since the sample size is pretty small: > > 264 and the class distr is like this (Diag is the response variable) > > sample.size <- lapply(1:6, function(i) sum(Diag==i)) > >> sample.size > > [[1]] > > [1] 36 > > > > [[2]] > > [1] 12 > > > > [[3]] > > [1] 120 > > > > [[4]] > > [1] 36 > > > > [[5]] > > [1] 30 > > > > [[6]] > > [1] 30 > > > > I assigned this sample.size to sampsz for a stratiefied sampling > > purpose and i got the following error: > > Error in sum(..., na.rm = na.rm) : invalid 'mode' of argument > > > > if I use sampsz=c(36, 12, 120, 36, 30, 30), then it is fine. Could you > > tell me why? > > The sum() function knows what to do on a vector, but not on a list. You > can turn your sample.size variable into a vector using > > unlist(sample.size) > > Duncan Murdoch > > > btw, as to classification problem for this with uneven class number > > situation, do u have some suggestions to improve its accuracy? I > > tried to use c() way to make the sampsz works but the result is > > similar. > > > > Thanks, > > > > weiwei > > > > On 6/30/05, Liaw, Andy <[EMAIL PROTECTED]> wrote: > >> The limitation comes from the way categorical splits are represented in the > >> code: For a categorical variable with k categories, the split is > >> represented by k binary digits: 0=right, 1=left. So it takes k bits to > >> store each split on k categories. To save storage, this is `packed' into a > >> 4-byte integer (32-bit), thus the limit of 32 categories. > >> > >> The current Fortran code (version 5.x) by Breiman and Cutler gets around > >> this limitation by storing the split in an integer array. While this lifts > >> the 32-category limit, it takes much more memory to store the splits. I'm > >> still trying to figure out a more memory efficient way of storing the > >> splits > >> without imposing the 32-category limit. If anyone has suggestions, I'm all > >> ears. > >> > >> Best, > >> Andy > >> > >> > From: [EMAIL PROTECTED] > >> > > >> > Hello, > >> > > >> > I'm using the random forest package. One of my factors in the > >> > data set contains 41 levels (I can't code this as a numeric > >> > value - in terms of linear models this would be a random > >> > factor). The randomForest call comes back with an error > >> > telling me that the limit is 32 categories. > >> > > >> > Is there any reason for this particular limit? Maybe it's > >> > possible to recompile the module with a different cutoff? > >> > > >> > thanks a lot for your help, > >> > kind regards, > >> > > >> > > >> > Arne > >> > > >> > ______________________________________________ > >> > [email protected] mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide! > >> > http://www.R-project.org/posting-guide.html > >> > > >> > > >> > > >> > >> ______________________________________________ > >> [email protected] mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide! > >> http://www.R-project.org/posting-guide.html > >> > > > > > > -- Weiwei Shi, Ph.D "Did you always know?" "No, I did not. But I believed..." ---Matrix III ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
