Amy Uhrin wrote: > Is there an optimal / minimum sample size for attempting to construct a > classification tree using /rpart/? > > I have 27 seagrass disturbance sites (boat groundings) that have been > monitored for a number of years. The monitoring protocol for each site > is identical. From the monitoring data, I am able to determine the > level of recovery that each site has experienced. Recovery is our > categorical dependent variable with values of none, low, medium, high > which are based upon percent seagrass regrowth into the injury over > time. I wish to be able to predict the level of recovery of future > vessel grounding sites based upon a number of categorical / continuous > predictor variables used here including (but not limited to) such > parameters as: sediment grain size, wave exposure, original size > (volume) of the injury, injury age, injury location. > > When I run /rpart/, the data is split into only two terminal nodes based > solely upon values of the original volume of each injury. No other > predictor variables are considered, even though I have included about > six of them in the model. When I remove volume from the model the same > thing happens but with injury area - two terminal nodes are formed based > upon area values and no other variables appear. I was hoping that this > was a programming issue, me being a newbie and all, but I really think > I've got the code right. Now I am beginning to wonder if my N is too > small for this method? >
In my experience N needs to be around 20,000 to get both good accuracy and replicability of patterns if the number of potential predictors is not tiny. In general, the R^2 from rpart is not competitive with that from an intelligently fitted regression model. It's just a difficult problem, when relying on a single tree (hence the popularity of random forests, bagging, boosting). Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.