**Kernel SVM are not scalable** to large or even medium number of
samples as the complexity is quadratic (or more). You should try to:

- learn independent SVR models on a partitions of the data (e.g. 10
models trained on 5000 samples each) and then compute the mean
predictions of the 10 models as the final prediction. The aggregate
training complexity should be much lower:  10 * (5000 ** 2) << (10 *
5000) ** 2 and furthermore the 10 SVR models can be trained
independently in parallel. Also the grid search for the best hyper
parameters can be done only once on 5000 random samples and the
optimal parameters can be reused to trained the 9 remaining models.

- perform a feature expansion of the data using the Nystroem method
for instance and then fit a LinearSVC model on the resulting dataset.
You can use a Pipeline object to combine the 2 to be able to grid
search C and gamma together, see:
http://scikit-learn.org/stable/modules/kernel_approximation.html#nystroem-kernel-approx

- investigate other non linear regression models such as GBRT
regressors (see:
http://scikit-learn.org/stable/modules/ensemble.html#regression),
Adaboost (with decision stumps as the base learner, only available in
the master branch:
http://scikit-learn.org/dev/modules/ensemble.html#adaboost),
ExtraTreesRegressor (see
http://scikit-learn.org/stable/modules/ensemble.html#extremely-randomized-trees)

Note that the partitioning trick suggested for SVR might also work to
speed up the training of the other models.

Also in scikit-learn master there is an implementation of
RandomizedSearchCV as a much faster (yet approximate) alternative to
GridSearchCV. Beware that the cv_scores_ attributes it currently false
(but the best_params_ attribute is correct). This bug is fixed in this
PR: https://github.com/scikit-learn/scikit-learn/pull/2042

--
Olivier

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to