Hi, When using RFC on a multiclass problem with a large number of trees, would you expect the prediction for a given sample to match the OOB decision function i.e. should the prediction match the class with the highest OOB value for the given sample, when n_estimators is large? On my 3-class problem, the oob_decision_function_ for a given sample is [ 0.31091392 0.2982096 0.39087648]
but the prediction for that sample is the middle class (OOB=0.29), whereas I thought it should have been the last class (which has the higher OOB value of 0.39). According to the docs:1. The ensemble prediction is a weighted average of the prediction from each individual tree:In contrast to the original publication [B2001], the scikit-learn implementation combines classifiers by averaging their probabilistic prediction, instead of letting each classifier vote for a single class. (taken from section 1.11.2.1 in 1.11. Ensemble methods — scikit-learn 0.19.1 documentation)2. The OOB values are for a given sample are the fraction of out-of-bag predictions for each class (see http://scikit-learn.org/stable/auto_examples/ensemble/plot_ensemble_oob.html) I thought the prediction for a given sample would converge to the class with the highest OOB value as the number of trees increases, and consequently thought that I could interpret the OOB values for a given sample as the probability of that sample belonging to the various classes. Is this incorrect? RegardsSteve
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn