Hello all,

When building a CART model (specifically classification tree) using rpart,
it is sometimes interesting to know what is the importance of the various
variables introduced to the model.

Thus, my question is: *What common measures exists for ranking/measuring
variable importance of participating variables in a CART model? And how can
this be computed using R (for example, when using the rpart package)*

For example, here is some dummy code, created so you might show your
solutions on it. This example is structured so that it is clear that
variable x1 and x2 are "important" while (in some sense) x1 is more
important then x2 (since x1 should apply to more cases, thus make more
influence on the structure of the data, then x2).

set.seed(31431)

n <- 400

x1 <- rnorm(n)

x2 <- rnorm(n)

x3 <- rnorm(n)

x4 <- rnorm(n)

x5 <- rnorm(n)

X <- data.frame(x1,x2,x3,x4,x5)

y <- sample(letters[1:4], n, T)

y <- ifelse(X[,2] < -1 , "b", y)

y <- ifelse(X[,1] < 0 , "a", y)

require(rpart)

fit <- rpart(y~., X)

plot(fit); text(fit)

info.gain.rpart(fit) # your function - telling us on each variable how
important it is

(references are always welcomed)


Thanks!

Tal

----------------Contact
Details:-------------------------------------------------------
Contact me: [email protected] |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to