Hi all,
I have trained a a Random Forest with 10K trees but I am not sure this is
the right number of trees for my datatset.
I'd like to tune the right number of trees using the oob score. However,
from the solution I see that the oob_score_ is just a float number. Which I
guess is the average of out of bag score for all 10K trees. But I'd like to
plot the oob score vs the number of trees.
Also, I see that the solution has a list of 10K decisionTreeClassifiers.
Each of these decisionTreeClassifiers has a field named: indices_ which I
guess is the set of indices of samples from the original dataset that were
used to build that decision tree.
So, if my guesses are true I am going to use these indices to determine out
of bag samples for each tree and calculate the OOB score for each tree.
Then I can use the average of the first n scores to calculate the average
oob score for the first n trees and then plot my figure.
Can any one please confirm if I am doing this right? Is there an easier way
to plot my figure?
Thanks,
Hossein
------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general