On Thu, Oct 01, 2015 at 11:10:51AM +0200, Maryam Tavakol wrote:
> My problem however is the size of data in terms of number of samples.
> The features are engineered and are only 80. I wanted to try training
> on bigger set of data for improvement.

I would use the BIRCH clustering method in an online way (using partial
fit) to create a coreset: a reduced amount of data points that best
represent the original samples with associated weights (corresponding to
the number of original data points in each cluster). I would then train
the gradient boosted classifier on the reduced data points and use sample
weights.

Gaƫl

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to