Re: [Scikit-learn-general] Sparse Gradient Boosting & Fully Corrective Gradient Boosting

Mathieu Blondel Sun, 21 Sep 2014 01:42:13 -0700

Hi Ken,

On Sun, Sep 21, 2014 at 4:16 AM, c TAKES <ctakesli...@gmail.com> wrote:


>
> Understandable that scikit-learn wants to focus on more mature algorithms,
> so perhaps I'll spend my efforts more on writing a python wrapper for
> Johnson and Zhang's implementation of RGF, at least for now.  Personally I
> do think it is fairly well proven though, as Johnson and Zhang have been
> quite successful with it in competitions.  To a lesser extent so have I for
> that matter :).
>

 I am not saying that their method isn't useful, just that it is too early
to add it to scikit-learn. Anyway, let us know if you come up with a
scikit-learn compatible implementation or wrapper :)


> One question I do have - you said that Option 3 is the one scikit-learn
> implements (setting one separate weight per leaf node of the last
> estimator, by line search).  Perhaps I'm overlooking something, but to me
> the code looks like it is implementing Option 1 (using the same constant
> value (`learning_rate`) for all estimators).
>

I've just checked and the optimal step size for the squared loss is:

step_size = \sum_i (y_i - f_i) h_i / \sum_i h_i^2

where y_i is the correct label of instance i, f_i is the prediction of the
ensemble so far and h_i is the prediction of the model to be added to the
ensemble. Recall that the tree is fitted against the residuals. So if the
tree perfectly fits the residuals, y_i - f_i = h_i and thus the optimal
step size is 1.0. This might not be the case if the tree is too shallow to
fit the data though. This is for option 2. For option 3, the sum must be
taken over the instances that reached a particular node.

If you have a look at other loss functions it is clear that scikit-learn
implements option 3.

Here's the loop over the leaves:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/gradient_boosting.py#L191

Here's the line search for one leaf in the case of the log loss:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/gradient_boosting.py#L377

To be more precise, scikit-learn finds the step size by a single
Newton-Raphson step, as suggested in the original gradient boosting paper.
So, the line search is only approximate.

Mathieu

------------------------------------------------------------------------------
Slashdot TV.  Video for Nerds.  Stuff that Matters.
http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Sparse Gradient Boosting & Fully Corrective Gradient Boosting

Reply via email to