Hi,
When using RFC on a multiclass problem with a large number of trees, would you 
expect the prediction for a given sample to match the OOB decision function 
i.e. should the prediction match the class with the highest OOB value for the 
given sample, when n_estimators is large?
On my 3-class problem, the oob_decision_function_ for a given sample is
[ 0.31091392  0.2982096   0.39087648]

but the prediction for that sample is the middle class (OOB=0.29), whereas I 
thought it should have been the last class (which has the higher OOB value of 
0.39). 
According to the docs:1. The ensemble prediction is a weighted average of the 
prediction from each individual tree:In contrast to the original publication 
[B2001], the scikit-learn implementation combines classifiers by averaging 
their probabilistic prediction, instead of letting each classifier vote for a 
single class. (taken from section 1.11.2.1 in 1.11. Ensemble methods — 
scikit-learn 0.19.1 documentation)2. The OOB values are for a given sample are 
the fraction of out-of-bag predictions for each class (see 
http://scikit-learn.org/stable/auto_examples/ensemble/plot_ensemble_oob.html)
I thought the prediction for a given sample would converge to the class with 
the highest OOB value as the number of trees increases, and consequently 
thought that I could interpret the OOB values for a given sample as the 
probability of that sample belonging to the various classes. Is this incorrect?
RegardsSteve

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to