On Tue, 25 Jan 2005, Liaw, Andy wrote:

From: WeiWei Shi

Hi,
I am trying to make a multi-class classification tree by using rpart.
I used MASS package'd data: fgl to test and it works well.

However, when I used my small-sampled data as below, the program seems
to take forever. I am not sure if it is due to slowness or there is
something wrong with my codes or data manipulation.

Please be advised !

The data is described as the output from str() function. The call to
rpart is like:

library(rpart)
test_tree<-rpart(x$V142 ~ ., data=x,
parms=list(split='gini'), cp =0.01)

the response variable is $V142, with 3 levels.

Thanks for your suggestions!

Ed.

[snip]

 $ V141: Factor w/ 88 levels "1001","1002",..: 59 59 59 59 59
59 55 78 7 73 ...

I'd bet this is the problem. There are 2^(88-1) - 1 possible ways to split a factor with 88 levels. It will work on those splits til the cows come home...

I'd suggest getting rid of that variable, or collapse the levels to
something more reasonable.  The CART book describes some heuristic shortcuts
for testing only n-1 splits for factors with n levels, but I believe that
only works for 2-class problems, if I'm not mistaken.

You don't need heuristics: there is a fast algorithm (proved in my PRNN book) for two classes only. I believe rpart implements it.


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to