Re: [scikit-learn] Breiman vs. scikit-learn definition of Feature Importance

2018-05-16 Thread Gilles Louppe
> Do you recommend using max_features=1 with ExtraTrees? If what you want are feature importances that reflect, without 'bias', the mutual information of each variable (alone or in combination with others) with Y, then yes. Bonus points if you set min_impurity_decrease > 0, to avoid splitting on n

Re: [scikit-learn] Breiman vs. scikit-learn definition of Feature Importance

2018-05-16 Thread Andreas Mueller
I don't think that's how most people use the trees, though. Probably not even the ExtraTrees. I really need to get around to reading your thesis :-/ Do you recommend using max_features=1 with ExtraTrees? On 05/05/2018 05:21 AM, Gilles Louppe wrote: Hi, See also chapters 6 and 7 of http://arxi

Re: [scikit-learn] Breiman vs. scikit-learn definition of Feature Importance

2018-05-05 Thread Gilles Louppe
Hi, See also chapters 6 and 7 of http://arxiv.org/abs/1407.7502 for another point of view regarding the "issue" with feature importances. TLDR: Feature importances as we have them in scikit-learn (i.e. MDI) are provably **not** biased, provided trees are built totally at random (as in ExtraTrees w

Re: [scikit-learn] Breiman vs. scikit-learn definition of Feature Importance

2018-05-05 Thread Guillaume LemaƮtre
+1 on the post pointed out by Jeremiah. On 5 May 2018 at 02:08, Johnson, Jeremiah wrote: > Faraz, take a look at the discussion of this issue here: > http://parrt.cs.usfca.edu/doc/rf-importance/index.html > > Best, > Jeremiah > = > Jeremiah W. Johnson, Ph.

Re: [scikit-learn] Breiman vs. scikit-learn definition of Feature Importance

2018-05-04 Thread Johnson, Jeremiah
Faraz, take a look at the discussion of this issue here: http://parrt.cs.usfca.edu/doc/rf-importance/index.html Best, Jeremiah = Jeremiah W. Johnson, Ph.D Asst. Professor of Data Science Program Coordinator, B.S. in Analytics & Data Science University of Ne

Re: [scikit-learn] Breiman vs. scikit-learn definition of Feature Importance

2018-05-04 Thread Sebastian Raschka
Not sure how it compares in practice, but it's certainly more efficient to rank the features by impurity decrease rather than by OOB permutation performance you wouldn't need to a) compute the OOB performance (an extra pass inference step) b) permute a feature column and do another inference pas