[R] rpart minimum sample size

Terry Therneau Wed, 28 Feb 2007 07:01:04 -0800

  Look at rpart.control.  Rpart has two "advisory" parameters that control
the tree size at the smallest nodes:
        minsplit (default 20): a node with less than this many subjects will
        not be worth splitting
        
        minbucket (default 7) : don't create any final nodes with <7 
        observations
        
As I said, these are advisory, and reflect that these final splits are usually
not worthwhile.  They lead to a little faster run time, but mostly to a less
complex plotted model.


  I am not nearly as pessimistic as Frank Harrell ("need 20,000 observations").
Rpart often gives a good model -- one that predicts the outcome, and I find
the intermediate steps that it takes informative.  However, there are often many
trees with similar predictive ability, but a very different "look" in terms
of splitpoints and variables.  Saying that any given rpart model is THE best
is perilous.
        Terry T.

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] rpart minimum sample size

Reply via email to