Re: [R] confusion matrix in randomForest

2008-07-21 Thread Liaw, Andy
randomForest predictions are based on votes of individual trees, thus
have little to do with error rates of individual trees.

Andy 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Miklos Kiss
 Sent: Saturday, July 19, 2008 10:47 PM
 To: r-help@r-project.org
 Subject: [R] confusion matrix in randomForest
 
 
 I have a question on the output generated by randomForest in 
 classification
 mode, specifically, the confusion matrix.  The confusion 
 matrix lists the
 various classes and how the forest classified each one, plus the
 classification error.  Are these numbers essentially averages 
 over all the
 trees in the forest?  If so, is there a way I can get the 
 standard deviation
 values out of the randomForest, or do I have to evaluate each tree
 individually?  By way of illustration, let me show the 
 confusion matrix
 using the iris data.  The output below shows that the forest correctly
 classified 47 versicolor irises, but this is the result for the entire
 forest.  I'd like to know if every tree will have 47 
 correctly classified
 versicolor irises, but I don't think it will.  Same for the 
 class.error
 value.  Not every tree will have those exact same values, right?
 
 But this raises another question.  For this example, I used 
 the entire data
 set to generate the forest, and so I assume that the 
 confusion matrix is
 based on OOB data, so if I created a training set and evaluated trees
 individually in the test set I could get averages and 
 standard deviations on
 the error rate.
 
 Any thoughts?  Thanks in advance.
 
 -Miklos Z. Kiss
 
  print(iris.rf)
 Call:
  randomForest(formula = Species ~ ., data = iris, importance 
 = TRUE, 
 keep.forest = TRUE) 
Type of random forest: classification
  Number of trees: 500
 No. of variables tried at each split: 2
 
 OOB estimate of  error rate: 5.33%
 Confusion matrix:
setosa versicolor virginica class.error
 setosa 50  0 00.00
 versicolor  0 47 30.06
 virginica   0  5450.10
 -- 
 View this message in context: 
 http://www.nabble.com/confusion-matrix-in-randomForest-tp18550
873p18550873.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
Notice:  This e-mail message, together with any attachme...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] confusion matrix in randomForest

2008-07-20 Thread Miklos Kiss

I have a question on the output generated by randomForest in classification
mode, specifically, the confusion matrix.  The confusion matrix lists the
various classes and how the forest classified each one, plus the
classification error.  Are these numbers essentially averages over all the
trees in the forest?  If so, is there a way I can get the standard deviation
values out of the randomForest, or do I have to evaluate each tree
individually?  By way of illustration, let me show the confusion matrix
using the iris data.  The output below shows that the forest correctly
classified 47 versicolor irises, but this is the result for the entire
forest.  I'd like to know if every tree will have 47 correctly classified
versicolor irises, but I don't think it will.  Same for the class.error
value.  Not every tree will have those exact same values, right?

But this raises another question.  For this example, I used the entire data
set to generate the forest, and so I assume that the confusion matrix is
based on OOB data, so if I created a training set and evaluated trees
individually in the test set I could get averages and standard deviations on
the error rate.

Any thoughts?  Thanks in advance.

-Miklos Z. Kiss

 print(iris.rf)
Call:
 randomForest(formula = Species ~ ., data = iris, importance = TRUE, 
keep.forest = TRUE) 
   Type of random forest: classification
 Number of trees: 500
No. of variables tried at each split: 2

OOB estimate of  error rate: 5.33%
Confusion matrix:
   setosa versicolor virginica class.error
setosa 50  0 00.00
versicolor  0 47 30.06
virginica   0  5450.10
-- 
View this message in context: 
http://www.nabble.com/confusion-matrix-in-randomForest-tp18550873p18550873.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.