> Is there any other way through which I can
train GradientBoostingRegressor for this dataset?

No, not yet.

However, our implementation of gradient boosting has a `subsample` option
for using a subset of the data when building each tree (this is called
stochastic gradient boosting in the literature). While this is currently
only used for preventing overfitting, we could use a similar strategy to
implement partial_fit. The idea would be to require the user to call
partial_fit `n_estimators` times, each with a different subsample. How to
subsample would be entirely left to the user.

Currently, when adding a tree to the ensemble, scikit-learn uses a step
size which is the product of a learning rate (constant fixed ahead of time)
and a value found by line search (data-dependent). The value found by line
search may be quite bad if we only use the small subset of the data from
partial_fit. Furthermore, some authors argued that we don't need the line
search at all, as the goal is to do well on test data, not training data.
Thus I think we should add a `step_size` option which would take either
"line_search" or "constant" as value. The latter would be more appropriate
in the partial_fit case.

A similar approach could be taken to implement partial_fit in random
forests. However, random forests are embarrassingly parallel so we might
want to somehow add support for out-of-core learning in the future.

Mathieu


On Sat, Aug 23, 2014 at 6:10 PM, Mahendra Kariya <
geek3142-skle...@yahoo.co.in> wrote:

> Hello All,
>
> I have a 12G dataset on which I want to run GradientBoostingRegressor. But
> loading such a large dataset in memory is practically impossible. I can
> load it in chunks and train the model in batch mode, but I don't see any
> partial_fit method in gradient boosting.
>
> Is there any other way through which I can train GradientBoostingRegressor
> for this dataset?
>
>
>
> Thanks,
> Mahendra
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to