Hi All,

Posted the same question on StackExchange 
[link<https://stats.stackexchange.com/questions/431777/class-weight-in-random-forest-vs-breimans-weighted-random-forest>]
 but also circulating here to see if someone knows :)


I am confused whether the "class_weight" parameter in Python's sklearn's Random 
Forest Classifier 
(https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)
 is equivalent to Chen/Breiman's notion of "Weighted Random Forest" described 
in Section 2.3 
(https://statistics.berkeley.edu/sites/default/files/tech-reports/666.pdf). In 
short, "Weighted Random Forest" will "...assign a weight to each class, with 
the minority class given larger weight (i.e., higher misclassification cost). 
The class weights are incorporated into the RF algorithm in two places. In the 
tree induction procedure, class weights are used to weight the Gini criterion 
for finding splits. In the terminal nodes of each tree, class weights are again 
taken into consideration. The class prediction of each terminal node is 
determined by “weighted majority vote”; i.e., the weighted vote of a class is 
the weight for that class times the number of cases for that class at the 
terminal node. The final class prediction for RF is then determined by 
aggregatting the weighted vote from each individual tree, where the weights are 
average weights in the terminal nodes."

Question: I can't tell from the Python source code for RandomForestClassifier, 
is class_weight used to weight the Gini criterion for finding splits? And if 
not, can anyone recommend code that implements Weighted Random Forest? Thanks!

Thanks!
Kristen
http://kaltenburger.github.io/
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to