With millions of samples, LinearSVC or SGDClassifier are more appropriate.
However, they only support the linear kernel. Since you have only 5
features, I think it would be worth trying non-linear features. You can try
the kernel approximation module [1] and PolynomialFeatures [2]
http://scikit-learn.org/dev/modules/kernel_approximation.html#kernel-approximation
https://github.com/scikit-learn/scikit-learn/blob/master/examples/linear_model/plot_polynomial_interpolation.py(only
in master)
HTH,
Mathieu
On Thu, Feb 20, 2014 at 9:20 PM, Tommy Carstensen <t...@sanger.ac.uk> wrote:
> To scikit-learn-general,
>
> I am trying to do a binary classification (true/false) of millions of
> samples across 5 features with SVM. How many samples should I use for
> building my model? I tried using svm.SVC().fit() on hundreds of
> thousands of samples, but it ran for more than 12 hours. I am quite new
> to machine learning, so any help provided will be much appreciated.
> Thank you.
>
> P.S. I am not sure, if this is the appropriate forum. Please ignore my
> question, if it does not belong on this mailing list.
>
> Best wishes,
> Tommy
>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a
> company registered in England with number 2742969, whose registered
> office is 215 Euston Road, London, NW1 2BE.
>
>
> ------------------------------------------------------------------------------
> Managing the Performance of Cloud-Based Applications
> Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
> Read the Whitepaper.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general