Dear Scikit-learn gurus,
Sorry to spam the whole list but I couldn't find a better email for my
question regarding the results of the  predict_proba method in the Random 
Forest classifier.

I tried to reproduce the output of this method by following the description 
given in the documentation: That is, I averaged over the class probabilities 
for each tree in the forest. I computed the class probability
for each tree, for each object in my test data, by
first determining in which leaf of the tree my test datum landed. Then I set 
the class probabilities equal to the fraction of objects in each class in the 
training data that also landed in the same leaf.

For example, if my test datum landed in node 55 of tree #0,
and supposing that 10 objects from my training data also landed in node 55 of 
tree #0, with 4 objects in the first cllass and 6 in the second, then the 
probabilities for that tree would be [0.4, 0.6]. (And then I average these 
probabilities over all the trees in the forest.)

Unfortunately, the answers that I get for the probabilities from the above 
algorithm and the results of predict_proba don't agree.
For example, for 4 objects in my test data I get the following probabilites:
[ 0.99718369  0.00281631]
   [ 0.99711619  0.00288381]
   [ 0.99680974  0.00319026]
   [ 0.55153962  0.44846038]

but predict_proba gives

[1.0 0.0]
[1.0 0.0]
[1.0 0.0]
[0.4 0.6]

Can anyone please tell me what I am doing wrong? I have checked the source code 
and the averaging step seems to be correct. I must be misinterpreting how to 
compute the class probabilities.

Thanks
Eve

***************************************************************
Eve Kovacs
Argonne National Laboratory,
Room L-177, Bldg. 360, HEP
9700 S. Cass Ave.
Argonne, IL 60439 USA
Phone: (630)-252-6208
Fax:   (630)-252-5047
email: kov...@anl.gov
***************************************************************

------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to