On Tue, 25 Jan 2005, Liaw, Andy wrote:
From: WeiWei Shi
Hi, I am trying to make a multi-class classification tree by using rpart. I used MASS package'd data: fgl to test and it works well.
However, when I used my small-sampled data as below, the program seems to take forever. I am not sure if it is due to slowness or there is something wrong with my codes or data manipulation.
Please be advised !
The data is described as the output from str() function. The call to rpart is like:
library(rpart) test_tree<-rpart(x$V142 ~ ., data=x, parms=list(split='gini'), cp =0.01)
the response variable is $V142, with 3 levels.
Thanks for your suggestions!
Ed.
[snip]
$ V141: Factor w/ 88 levels "1001","1002",..: 59 59 59 59 59 59 55 78 7 73 ...
I'd bet this is the problem. There are 2^(88-1) - 1 possible ways to split a factor with 88 levels. It will work on those splits til the cows come home...
I'd suggest getting rid of that variable, or collapse the levels to something more reasonable. The CART book describes some heuristic shortcuts for testing only n-1 splits for factors with n levels, but I believe that only works for 2-class problems, if I'm not mistaken.
You don't need heuristics: there is a fast algorithm (proved in my PRNN book) for two classes only. I believe rpart implements it.
-- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
