Re: [scikit-learn] impurity criterion in gradient boosted regression trees

Jacob Schreiber Thu, 11 May 2017 16:40:27 -0700

The blog post from Matthew Drury sums it up well. The feature importance is
indeed the Gini impurity.


On Tue, May 9, 2017 at 8:34 AM, Olga Lyashevska <o.lyashevsk...@gmail.com>
wrote:

> Hi all,
>
> I am trying to understand differences in feature importance plots obtained
> with R package gbm and sklearn. Having compared both implementation side by
> side it seems that the models are fairly similar, however feature
> importance plots are rather distinct.
>
> R uses empirical improvement in squared error as it is described in
> Friedmans's "Greedy Function Approximation" paper (eq. 44, 45).
>
> sklearn (as far as I could see it in the code) uses the weighted reduction
> in node purity. How exactly is this calculated? Is it a gini index? Is
> there a reference?
>
> I found this, but I find this hard to follow:
> https://github.com/scikit-learn/scikit-learn/blob/fc2f24927f
> c37d7e42917369f17de045b14c59b5/sklearn/tree/_tree.pyx#L1056
>
> I have also seen a post by Matthew Drury on stack exchange:
> https://stats.stackexchange.com/questions/162162/relative-va
> riable-importance-for-boosting
>
> Many thanks,
> Olga
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] impurity criterion in gradient boosted regression trees

Reply via email to