Hi Olivier,

There are indeed several ways to get feature "importances". As often, there
is no strict consensus about what this word means.

In our case, we implement the importance as described in [1] (often cited,
but unfortunately rarely read...). It is sometimes called "gini importance"
or "mean decrease impurity" and is defined as the total decrease in node
impurity (weighted by the probability of reaching that node (which is
approximated by the proportion of samples)) averaged over all trees of the
ensemble.

The other measure is the one you describe. It is sometimes called "mean
decrease accuracy". It is more intensive to compute since it requires
(repeated) random  permutations of each feature. It also works only with
bootstrapping.

Note that both measures are available in the randomForest R package.

[1]: Breiman, Friedman, "Classification and regression trees", 1984.

I'll reply on SO as well.

Hope this helps,

Gilles



On 4 April 2013 21:35, Peter Prettenhofer <[email protected]>wrote:

> I posted a brief description of the algorithm. The method that we
> implement is briefly described in ESLII. Gilles is the expert here, he can
> give more details on the issue.
>
>
> 2013/4/4 Olivier Grisel <[email protected]>
>
>> The variable importance in scikit-learn's implementation of random
>> forest is based on the proportion of samples that were classified by
>> the feature at some point in one of the decision trees evaluation.
>>
>>
>> http://scikit-learn.org/stable/modules/ensemble.html#feature-importance-evaluation
>>
>> This method seems different from the OOB based method of Breiman 2001
>> (section 10):
>>
>> http://www.stat.berkeley.edu/~breiman/randomforest2001.pdf
>>
>> Is there any reference for the method implemented in the scikit?
>>
>> Here is the original Stack Overflow question:
>>
>>
>> http://stackoverflow.com/questions/15810339/how-are-feature-importances-in-randomforestclassifier-determined/15811003?noredirect=1#comment22487062_15811003
>>
>> --
>> Olivier
>> http://twitter.com/ogrisel - http://github.com/ogrisel
>>
>>
>> ------------------------------------------------------------------------------
>> Minimize network downtime and maximize team effectiveness.
>> Reduce network management and security costs.Learn how to hire
>> the most talented Cisco Certified professionals. Visit the
>> Employer Resources Portal
>> http://www.cisco.com/web/learning/employer_resources/index.html
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> --
> Peter Prettenhofer
>
>
> ------------------------------------------------------------------------------
> Minimize network downtime and maximize team effectiveness.
> Reduce network management and security costs.Learn how to hire
> the most talented Cisco Certified professionals. Visit the
> Employer Resources Portal
> http://www.cisco.com/web/learning/employer_resources/index.html
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to