I can't grasp how it can be that the mean prediction at terminal nodes
perfectly fit the true mean values of the observed variable at the terminal
nodes -
I'm afraid I'm missing something completely obviuos here:

# make a regression tree:
rt <- ctree(Ozone ~ ., data = airq)

# Validate:
Prediction <- unlist(treeresponse(rt))
(Val <- data.frame(Node = rt@where,
                   Prediction, True = airq$Ozone))

# compare mean prediction per node
# with observed mean values per node:
options(scipen = 999)
cbind(aggregate(True ~ Node, FUN = mean, data = Val),
      Pred = aggregate(Prediction ~ Node, FUN = mean, data = Val)[, 2])

# also, plot predictions vs. true values:
plot(Val$Prediction, Val$True)
coef <- coef(lm(Val$Prediction ~ Val$True))
abline(c(0, coef[1]), c(1, coef[2]))
myseq <- seq(0, 75, 25)
abline(v = myseq, h = myseq)

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Reply via email to