The variable importance in scikit-learn's implementation of random forest is based on the proportion of samples that were classified by the feature at some point in one of the decision trees evaluation.
http://scikit-learn.org/stable/modules/ensemble.html#feature-importance-evaluation This method seems different from the OOB based method of Breiman 2001 (section 10): http://www.stat.berkeley.edu/~breiman/randomforest2001.pdf Is there any reference for the method implemented in the scikit? Here is the original Stack Overflow question: http://stackoverflow.com/questions/15810339/how-are-feature-importances-in-randomforestclassifier-determined/15811003?noredirect=1#comment22487062_15811003 -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
