Re: [scikit-learn] Breiman vs. scikit-learn definition of Feature Importance

Gilles Louppe Wed, 16 May 2018 11:11:48 -0700

> Do you recommend using max_features=1 with ExtraTrees?

If what you want are feature importances that reflect, without 'bias', the
mutual information of each variable (alone or in combination with others)
with Y, then yes. Bonus points if you set min_impurity_decrease > 0, to
avoid splitting on noise and collecting that as part of the importance
scores.


The resulting forest will not be optimal with respect to
classification/regression performance though.
On Wed, 16 May 2018 at 19:29, Andreas Mueller <t3k...@gmail.com> wrote:

> I don't think that's how most people use the trees, though.
> Probably not even the ExtraTrees.
> I really need to get around to reading your thesis :-/

> Do you recommend using max_features=1 with ExtraTrees?


> On 05/05/2018 05:21 AM, Gilles Louppe wrote:
> > Hi,
> >
> > See also chapters 6 and 7 of http://arxiv.org/abs/1407.7502 for another
> > point of view regarding the "issue" with feature importances. TLDR:
Feature
> > importances as we have them in scikit-learn (i.e. MDI) are provably
**not**
> > biased, provided trees are built totally at random (as in ExtraTrees
with
> > max_feature=1) and the depth is controlled min_samples_split (to avoid
> > splitting on noise). On the other hand, it is not always clear what you
> > actually compute with MDA (permutation based importances), since it is
> > conditioned on the model you use.
> >
> > Gilles
> > On Sat, 5 May 2018 at 10:36, Guillaume Lemaître <g.lemaitr...@gmail.com>
> > wrote:
> >
> >> +1 on the post pointed out by Jeremiah.
> >> On 5 May 2018 at 02:08, Johnson, Jeremiah <jeremiah.john...@unh.edu>
> > wrote:
> >
> >>> Faraz, take a look at the discussion of this issue here:
> > http://parrt.cs.usfca.edu/doc/rf-importance/index.html
> >
> >>> Best,
> >>> Jeremiah
> >>> =========================================
> >>> Jeremiah W. Johnson, Ph.D
> >>> Asst. Professor of Data Science
> >>> Program Coordinator, B.S. in Analytics & Data Science
> >>> University of New Hampshire
> >>> Manchester, NH 03101
> >>> https://www.linkedin.com/in/jwjohnson314
> >>> From: scikit-learn <scikit-learn-bounces+jeremiah.johnson=
> > unh....@python.org> on behalf of "Niyaghi, Faraz" <
niyag...@oregonstate.edu>
> >>> Reply-To: Scikit-learn mailing list <scikit-learn@python.org>
> >>> Date: Friday, May 4, 2018 at 7:10 PM
> >>> To: "scikit-learn@python.org" <scikit-learn@python.org>
> >>> Subject: [scikit-learn] Breiman vs. scikit-learn definition of Feature
> > Importance
> >
> >>> Caution - External Email
> >>> ________________________________
> >>> Greetings,
> >>> This is Faraz Niyaghi from Oregon State University. I research on
> > variable selection using random forest. To the best of my knowledge,
there
> > is a  difference between scikit-learn's and Breiman's definition of
feature
> > importance. Breiman uses out of bag (oob) cases to calculate feature
> > importance but scikit-learn doesn't. I was wondering: 1) why are they
> > different? 2) can they result in very different rankings of features?
> >
> >>> Here are the definitions I found on the web:
> >>> Breiman: "In every tree grown in the forest, put down the oob cases
and
> > count the number of votes cast for the correct class. Now randomly
permute
> > the values of variable m in the oob cases and put these cases down the
> > tree. Subtract the number of votes for the correct class in the
> > variable-m-permuted oob data from the number of votes for the correct
class
> > in the untouched oob data. The average of this number over all trees in
the
> > forest is the raw importance score for variable m."
> >>> Link: https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
> >>> scikit-learn: " The relative rank (i.e. depth) of a feature used as a
> > decision node in a tree can be used to assess the relative importance of
> > that feature with respect to the predictability of the target variable.
> > Features used at the top of the tree contribute to the final prediction
> > decision of a larger fraction of the input samples. The expected
fraction
> > of the samples they contribute to can thus be used as an estimate of the
> > relative importance of the features."
> >>> Link: http://scikit-learn.org/stable/modules/ensemble.html
> >>> Thank you for reading this email. Please let me know your thoughts.
> >>> Cheers,
> >>> Faraz.
> >>> Faraz Niyaghi
> >>> Ph.D. Candidate, Department of Statistics
> >>> Oregon State University
> >>> Corvallis, OR
> >>> _______________________________________________
> >>> scikit-learn mailing list
> >>> scikit-learn@python.org
> >>> https://mail.python.org/mailman/listinfo/scikit-learn
> >
> >
> >
> >> --
> >> Guillaume Lemaitre
> >> INRIA Saclay - Parietal team
> >> Center for Data Science Paris-Saclay
> >> https://glemaitre.github.io/
> >> _______________________________________________
> >> scikit-learn mailing list
> >> scikit-learn@python.org
> >> https://mail.python.org/mailman/listinfo/scikit-learn
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn

> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Breiman vs. scikit-learn definition of Feature Importance

Reply via email to