> Do you recommend using max_features=1 with ExtraTrees?
If what you want are feature importances that reflect, without 'bias', the
mutual information of each variable (alone or in combination with others)
with Y, then yes. Bonus points if you set min_impurity_decrease > 0, to
avoid splitting on n
I don't think that's how most people use the trees, though.
Probably not even the ExtraTrees.
I really need to get around to reading your thesis :-/
Do you recommend using max_features=1 with ExtraTrees?
On 05/05/2018 05:21 AM, Gilles Louppe wrote:
Hi,
See also chapters 6 and 7 of http://arxi
Hi,
See also chapters 6 and 7 of http://arxiv.org/abs/1407.7502 for another
point of view regarding the "issue" with feature importances. TLDR: Feature
importances as we have them in scikit-learn (i.e. MDI) are provably **not**
biased, provided trees are built totally at random (as in ExtraTrees w
+1 on the post pointed out by Jeremiah.
On 5 May 2018 at 02:08, Johnson, Jeremiah wrote:
> Faraz, take a look at the discussion of this issue here:
> http://parrt.cs.usfca.edu/doc/rf-importance/index.html
>
> Best,
> Jeremiah
> =
> Jeremiah W. Johnson, Ph.
Faraz, take a look at the discussion of this issue here:
http://parrt.cs.usfca.edu/doc/rf-importance/index.html
Best,
Jeremiah
=
Jeremiah W. Johnson, Ph.D
Asst. Professor of Data Science
Program Coordinator, B.S. in Analytics & Data Science
University of Ne
Not sure how it compares in practice, but it's certainly more efficient to rank
the features by impurity decrease rather than by OOB permutation performance
you wouldn't need to
a) compute the OOB performance (an extra pass inference step)
b) permute a feature column and do another inference pas