Dear Scikit-learn gurus, Sorry to spam the whole list but I couldn't find a better email for my question regarding the results of the predict_proba method in the Random Forest classifier.
I tried to reproduce the output of this method by following the description given in the documentation: That is, I averaged over the class probabilities for each tree in the forest. I computed the class probability for each tree, for each object in my test data, by first determining in which leaf of the tree my test datum landed. Then I set the class probabilities equal to the fraction of objects in each class in the training data that also landed in the same leaf. For example, if my test datum landed in node 55 of tree #0, and supposing that 10 objects from my training data also landed in node 55 of tree #0, with 4 objects in the first cllass and 6 in the second, then the probabilities for that tree would be [0.4, 0.6]. (And then I average these probabilities over all the trees in the forest.) Unfortunately, the answers that I get for the probabilities from the above algorithm and the results of predict_proba don't agree. For example, for 4 objects in my test data I get the following probabilites: [ 0.99718369 0.00281631] [ 0.99711619 0.00288381] [ 0.99680974 0.00319026] [ 0.55153962 0.44846038] but predict_proba gives [1.0 0.0] [1.0 0.0] [1.0 0.0] [0.4 0.6] Can anyone please tell me what I am doing wrong? I have checked the source code and the averaging step seems to be correct. I must be misinterpreting how to compute the class probabilities. Thanks Eve *************************************************************** Eve Kovacs Argonne National Laboratory, Room L-177, Bldg. 360, HEP 9700 S. Cass Ave. Argonne, IL 60439 USA Phone: (630)-252-6208 Fax: (630)-252-5047 email: kov...@anl.gov *************************************************************** ------------------------------------------------------------------------------ Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general