Thanks Terry!
I managed to figure that out shortly after posting (as is the way!) Adding an
additional covariate that splits below one of the x branches but not the other
and means the class proportion to go over 0.5 means the x split is retained.
However, I now have another conundrum, this
You are mixing up two of the steps in rpart. 1: how to find the best candidate split and
2: evaluation of that split.
With the "class" method we use the information or Gini criteria for step 1. The code
finds a worthwhile candidate split at 0.5 using exactly the calculations you outline.
The following code produces a tree with only a root. However, clearly the tree
with a split at x=0.5 is better. rpart doesn't seem to want to produce it.
Running the following produces a tree with only root.
y <- c(rep(0,65),rep(1,15),rep(0,20))
x <- c(rep(0,70),rep(1,30))
f <- rpart(y ~ x,
3 matches
Mail list logo