Re: [Scikit-learn-general] SVM, appropriate size of training set

Olivier Grisel Mon, 24 Feb 2014 01:06:24 -0800

Is your dataset balanced (roughly as many positive as negative)?

Kernel SVMs as implemented in scikit-learn do not scale with the
number of samples: the computational cost is more than quadratic wrt
n_samples.
Either subsample (especially if you have a large imbalance), use an
approximation such as Nystroem [1] feature expansion + linear model or
use a more scalable non-linear algorithm such as
RandomForestsClassifier.


[1] http://scikit-learn.org/stable/modules/kernel_approximation.html

-- 
Olivier

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] SVM, appropriate size of training set

Reply via email to