> From: WeiWei Shi > > Hi, > I am trying to make a multi-class classification tree by using rpart. > I used MASS package'd data: fgl to test and it works well. > > However, when I used my small-sampled data as below, the program seems > to take forever. I am not sure if it is due to slowness or there is > something wrong with my codes or data manipulation. > > Please be advised ! > > The data is described as the output from str() function. The call to > rpart is like: > > library(rpart) > test_tree<-rpart(x$V142 ~ ., data=x, > parms=list(split='gini'), cp =0.01) > > the response variable is $V142, with 3 levels. > > Thanks for your suggestions! > > Ed.
[snip] > $ V141: Factor w/ 88 levels "1001","1002",..: 59 59 59 59 59 > 59 55 78 7 73 ... I'd bet this is the problem. There are 2^(88-1) - 1 possible ways to split a factor with 88 levels. It will work on those splits til the cows come home... I'd suggest getting rid of that variable, or collapse the levels to something more reasonable. The CART book describes some heuristic shortcuts for testing only n-1 splits for factors with n levels, but I believe that only works for 2-class problems, if I'm not mistaken. Andy ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
