Look at rpart.control. Rpart has two "advisory" parameters that control
the tree size at the smallest nodes:
minsplit (default 20): a node with less than this many subjects will
not be worth splitting
minbucket (default 7) : don't create any final nodes with <7
observations
As I said, these are advisory, and reflect that these final splits are usually
not worthwhile. They lead to a little faster run time, but mostly to a less
complex plotted model.
I am not nearly as pessimistic as Frank Harrell ("need 20,000 observations").
Rpart often gives a good model -- one that predicts the outcome, and I find
the intermediate steps that it takes informative. However, there are often many
trees with similar predictive ability, but a very different "look" in terms
of splitpoints and variables. Saying that any given rpart model is THE best
is perilous.
Terry T.
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.