I can't grasp how it can be that the mean prediction at terminal nodes perfectly fit the true mean values of the observed variable at the terminal nodes - I'm afraid I'm missing something completely obviuos here:
# make a regression tree: rt <- ctree(Ozone ~ ., data = airq) # Validate: Prediction <- unlist(treeresponse(rt)) (Val <- data.frame(Node = rt@where, Prediction, True = airq$Ozone)) # compare mean prediction per node # with observed mean values per node: options(scipen = 999) cbind(aggregate(True ~ Node, FUN = mean, data = Val), Pred = aggregate(Prediction ~ Node, FUN = mean, data = Val)[, 2]) # also, plot predictions vs. true values: plot(Val$Prediction, Val$True) coef <- coef(lm(Val$Prediction ~ Val$True)) abline(c(0, coef[1]), c(1, coef[2])) myseq <- seq(0, 75, 25) abline(v = myseq, h = myseq) [[alternative HTML version deleted]] _______________________________________________ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology