Hello,

 

I am trying to explore the use of random forests for classification and
am certain about the interpretation of the importance measurements.

 

When having the option "importance = T" in the randomForest call, the
resulting 'importance' element matrix has four columns with the
following headings:

0 - mean raw importance score of variable x for class 0 (where
importance is the difference between the permutated data error and the
original test set error)

1 - mean raw importance score of variable x for class 1

MeanDecreaseAccuracy : average lowering of the margin across all cases
(where margin is the proportion of votes for the true class - the
maximum proportion of votes for the other classes)

MeanDecreaseGini : summation of the gini decreases over all trees in the
forest

 

Are these definitions correct?  Why is the raw importance score
calculated for each class?  Could one just average the raw importance
scores for class 0 and 1 to get a composite importance score?

 

Now, when having the option "importance = F" in the randomForest call,
the 'importance' element is now a vector.  What values are those?

 

Thank you in advance for any input you may have.

 

Best,

Ewy

 

 

 

 

Ewy Mathe, Ph. D.

Laboratory of Human Carcinogenesis

National Cancer Institute, NIH

37 Convent Drive

Building 37, Room 3068

Bethesda, MD  20892-4255

Tel: 301-496-5835

Fax: 301-496-0497

 


        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to