Re: [Scikit-learn-general] Sparse Gradient Boosting & Fully Corrective Gradient Boosting

c TAKES Sat, 20 Sep 2014 12:16:56 -0700

Mathieu - thanks so much for this in depth response.  It clarified a fair
amount of what I was seeing in the gradient boosting source code that
differs from my prior understanding of standard GBMs.  It also became clear
from reviewing the code that to implement RGF I would have to modify both
tree.py and _tree.pyx as well to accomplish, as you described it, the
interleaving of tree induction and boosting.  I would need to make it so a
DecisionTreeRegressor - or perhaps a separate class implementing
BaseDecisionTree - can add one new split at a time, as well as add a
function to return loss reductions that would result from each possible
split.


Understandable that scikit-learn wants to focus on more mature algorithms,
so perhaps I'll spend my efforts more on writing a python wrapper for
Johnson and Zhang's implementation of RGF, at least for now.  Personally I
do think it is fairly well proven though, as Johnson and Zhang have been
quite successful with it in competitions.  To a lesser extent so have I for
that matter :).

One question I do have - you said that Option 3 is the one scikit-learn
implements (setting one separate weight per leaf node of the last
estimator, by line search).  Perhaps I'm overlooking something, but to me
the code looks like it is implementing Option 1 (using the same constant
value (`learning_rate`) for all estimators).

Take the loss function LeastSquaresError for simplicity.  Unless I'm
mistaken, the predictions are simply updated by this statement:

y_pred[:, k] += learning_rate * tree.predict(X).ravel()

where learning_rate is a constant, and that's it.  Isn't that Option 1?  I
don't see any line search.

Best,
Ken

On Sat, Sep 20, 2014 at 10:04 AM, Mathieu Blondel <math...@mblondel.org>
wrote:

> I read the RGF paper. Interesting method but definitely too early to
> include it in scikit-learn (we focus on mature algorithms). This also looks
> more complicated to implement than gradient boosting, since tree induction
> and boosting are interleaved.
>
> The paper also clarified what "fully corrective" means, thanks. To
> summarize, here are different strategies for setting the weights in
> gradient boosting:
> 1. using the same constant value (`learning_rate`) for all estimators
> 2. setting the weight of the last estimator by line search (other weights
> are kept fixed)
> 3. setting one separate weight per leaf node of the last estimator, by
> line search
> 4. refitting all estimators weights (including the past ones)
> 5. refitting all leaf nodes of all estimators?
>
> Some authors [1] argued that option 1 is sufficient in practice to get
> good performance since our goal is to do well on test data, not training
> data.
> Option 3 is what scikit-learn implements. Option 4 is the fully corrective
> case. It  basically amounts to a least squares for square loss or to
> logisic regression for log loss (using each estimator predictions as
> features). One thing though is that our implementation doesn't store the
> estimator weights explicitly (leaves are updated directly). Setting a
> penalty (l1 or l2) on the estimator weights (i.e., on the functional)
> should prevent overfitting. Option 5 sounds pretty computationally
> expensive, although the update doesn't need to be done at every iteration.
>
> I recently re-implemented gradient boosting [2]. One difference in my
> implementation is that it is possible to use any base learner (not just
> trees). So my implementation stores estimator weights explicitly and uses
> option 2 above. The fully corrective updates (option 4) might be easier to
> add to my implementation.
>
> BTW, the notion of fully corrective updates is also present in the
> matching pursuit (-> orthogonal matching pursuit) and frank-wolfe
> literatures.
>
> Mathieu
>
> [1] "Boosting Algorithms: Regularization, Prediction and Model Fitting",
> Peter B ̈uhlmann and Torsten Hothorn (thanks to Peter for telling me about
> this paper)
>
> [2]
> https://github.com/mblondel/ivalice/blob/master/ivalice/impl/gradient_boosting.py
>
> Mathieu
>
> On Wed, Sep 17, 2014 at 4:02 AM, c TAKES <ctakesli...@gmail.com> wrote:
>
>> yes - In fact my real goal is to implement RGF ultimately, though I had
>> considered building/forking off the current GradientBoostingRegressor
>> package as a starting point A) b/c I'm new to developing for scikit-learn
>> and B) to maintain as much consistency as possible with the rest of the
>> package.
>>
>> Upon further review though, I don't think there's much point in adding
>> fully corrective updates to GBR because without the regularization (the
>> rest of RGF) it is probably useless and leads to over fitting, as per
>> http://stat.rutgers.edu/home/tzhang/software/rgf/tpami14-rgf.pdf.
>>
>> So it would likely be more useful to go ahead and create RGF as an
>> entirely separate module.  But that could take some time, of course.
>>
>> Is anyone working on RGF for sklearn?
>>
>> Arnaud, thanks for directing me to your sparse implementation, I will
>> have a look!  It would certainly be excellent to have this work for all
>> decision tree algorithms.
>>
>> Ken
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Sep 16, 2014 at 11:16 AM, Peter Prettenhofer <
>> peter.prettenho...@gmail.com> wrote:
>>
>>> The only reference I know is the Regularized Greedy Forest paper by
>>> Johnson and Zhang [1]
>>> I havent read the primary source (by Zhang as well).
>>>
>>> [1] http://arxiv.org/abs/1109.0887
>>>
>>> 2014-09-16 15:15 GMT+02:00 Mathieu Blondel <math...@mblondel.org>:
>>>
>>>> Could you give a reference for gradient boosting with fully corrective
>>>> updates?
>>>>
>>>> Since the philosophy of gradient boosting is to fit each tree against
>>>> the residuals (or negative gradient) so far, I am wondering how such fully
>>>> corrective update would work...
>>>>
>>>> Mathieu
>>>>
>>>> On Tue, Sep 16, 2014 at 9:16 AM, c TAKES <ctakesli...@gmail.com> wrote:
>>>>
>>>>> Is anyone working on making Gradient Boosting Regressor work with
>>>>> sparse matrices?
>>>>>
>>>>> Or is anyone working on adding an option for fully corrective gradient
>>>>> boosting, I.E. all trees in the ensemble are re-weighted at each 
>>>>> iteration?
>>>>>
>>>>> These are things I would like to see and may be able to help with if
>>>>> no one is currently working on them.
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Want excitement?
>>>>> Manually upgrade your production database.
>>>>> When you want reliability, choose Perforce.
>>>>> Perforce version control. Predictably reliable.
>>>>>
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Want excitement?
>>>> Manually upgrade your production database.
>>>> When you want reliability, choose Perforce.
>>>> Perforce version control. Predictably reliable.
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> --
>>> Peter Prettenhofer
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Want excitement?
>>> Manually upgrade your production database.
>>> When you want reliability, choose Perforce.
>>> Perforce version control. Predictably reliable.
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>

------------------------------------------------------------------------------
Slashdot TV.  Video for Nerds.  Stuff that Matters.
http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Sparse Gradient Boosting & Fully Corrective Gradient Boosting

Reply via email to