Here's an example, using the iris data: > ## Grow one tree, using all data, and try all variables at all splits, > ## using large nodesize to get smaller tree. > iris.rf <- randomForest(iris[-5], iris[[5]], ntree=1, nodesize=20, mtry=4, + sampsize=150, replace=FALSE) > getTree(iris.rf, 1) left daughter right daughter split var split point status prediction 1 2 3 3 2.45 1 0 2 0 0 0 0.00 -1 1 3 4 5 4 1.75 1 0 4 6 7 3 4.95 1 0 5 8 9 3 4.85 1 0 6 10 11 4 1.65 1 0 7 0 0 0 0.00 -1 3 8 0 0 0 0.00 -1 3 9 0 0 0 0.00 -1 3 10 0 0 0 0.00 -1 2 11 0 0 0 0.00 -1 3 > idx <- with(iris, Petal.Length > 2.45 & Petal.Length < 3.5) > predict(iris.rf, iris[idx, -5]) [1] versicolor versicolor versicolor Levels: setosa versicolor virginica > iris.rf$forest$xbestsplit[1,1] <- 3.5 > predict(iris.rf, iris[newiris, -5]) [1] setosa setosa setosa Levels: setosa versicolor virginica
Note how the predictions have changed. HTH, Andy > -----Original Message----- > From: Martin Lam [mailto:[EMAIL PROTECTED] > Sent: Friday, September 09, 2005 9:04 AM > To: Liaw, Andy; [email protected] > Subject: RE: [R] Re-evaluating the tree in the random forest > > > Hi, > > Let me give a simple example, assume a dataset > containing 5 instances with 1 variable and the class > label: > > [x1, y]: > [0.5, A] > [3.2, B] > [4.5, B] > [1.4, C] > [1.6, C] > [1.9, C] > > Assume that the randomForest algorithm create this (2 > levels deep) tree: > > Root node: question: x1 < 2.2? > > Left terminal node: > [0.5, A] > [1.4, C] > [1.6, C] > [1.9, C] > Leaf classification: C > > Right terminal node: > [3.2, B] > [4.5, B] > Leaf classification: B > > If I change the question at the root node to "x1 < > 1?", the instances in the left leaf node are not > correctly passed down the tree anymore. > My original question was if there was a way to > re-evaluate the instances again into: > > Root node: question: x1 < 1? > > Left terminal node: > [0.5, A] > Leaf classification: A > > Right terminal node: > [3.2, B] > [4.5, B] > [1.4, C] > [1.6, C] > [1.9, C] > Leaf classification: C > > Cheers, > > Martin > > --- "Liaw, Andy" <[EMAIL PROTECTED]> wrote: > > > > From: Martin Lam > > > > > > Dear mailinglist members, > > > > > > I was wondering if there was a way to re-evaluate > > the > > > instances of a tree (in the forest) again after I > > have > > > manually changed a splitpoint (or split variable) > > of a > > > decision node. Here's an illustration: > > > > > > library("randomForest") > > > > > > forest.rf <- randomForest(formula = Species ~ ., > > data > > > = iris, do.trace = TRUE, ntree = 3, mtry = 2, > > > norm.votes = FALSE) > > > > > > # I am going to change the splitpoint of the root > > node > > > of the first tree to 1 > > > forest.rf$forest$xbestsplit[1,] > > > forest.rf$forest$xbestsplit[1,1] <- 1 > > > forest.rf$forest$xbestsplit[1,] > > > > > > Because I've changed the splitpoint, some > > instances in > > > the leafs are not supposed where they should be. > > Is > > > there a way to reappoint them to the correct leaf? > > > > I'm not sure what you want to do exactly, but I > > suspect you can use > > predict(). > > > > > I was also wondering how I should interpret the > > output > > > of do.trace: > > > > > > ntree OOB 1 2 3 > > > 1: 3.70% 0.00% 6.25% 5.88% > > > 2: 3.49% 0.00% 3.85% 7.14% > > > 3: 3.57% 0.00% 5.56% 5.26% > > > > > > What's OOB and what does the percentages mean? > > > > OOB stands for `Out-of-bag'. Read up on random > > forests (e.g., the article > > in R News) to learn about it. Those numbers are > > estimated error rates. The > > `OOB' column is across all data, while the others > > are for the classes. > > > > Andy > > > > > > > Thanks in advance, > > > > > > Martin > > > > > > > > > > > > > > > > > > ______________________________________________________ > > > Click here to donate to the Hurricane Katrina > > relief effort. > > > > > > ______________________________________________ > > > [email protected] mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide! > > > http://www.R-project.org/posting-guide.html > > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > ---------------- > > Notice: This e-mail message, together with any > > attachments, contains information of Merck & Co., > > Inc. (One Merck Drive, Whitehouse Station, New > > Jersey, USA 08889), and/or its affiliates (which may > > be known outside the United States as Merck Frosst, > > Merck Sharp & Dohme or MSD and in Japan, as Banyu) > > that may be confidential, proprietary copyrighted > > and/or legally privileged. It is intended solely for > > the use of the individual or entity named on this > > message. If you are not the intended recipient, and > > have received this message in error, please notify > > us immediately by reply e-mail and then delete it > > from your system. > > > -------------------------------------------------------------- > ---------------- > > > > > > > > ______________________________________________________ > Click here to donate to the Hurricane Katrina relief effort. > http://store.yahoo.com/redcross-donate3/ > > > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
