Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

Andreas Mueller Tue, 06 Dec 2011 00:58:11 -0800

On 12/06/2011 04:55 AM, Gael Varoquaux wrote:
> On Mon, Dec 05, 2011 at 10:54:42PM +0100, Olivier Grisel wrote:
>> - libsvm uses SMO (a dual solver) and supports non-linear kernels and
>> has complexity ~ n_samples^3 hence cannot scale to large n_samples
>> (e.g. more than 50k).
>> - liblinear uses some kind of fancy coordinate descent (primal or dual
>> solvers) optimized for regularized linear models, provides more
>> regularization / loss function options such as l1 penalty and can
>> scale to large n_samples (as long as the sparse internal
>> representation of the data fits in memory).
>>> By the way, I suggest someone update the documentation to specify what
>>> the consequences of using the different SVM classes are. Currently
>>> LinearSVC is recommend "for huge datasets", not "for huge sparse
>>> datasets." That is on
>>> this page:
>>> http://scikit-learn.sourceforge.net/dev/modules/generated/sklearn.svm.LinearSVC.html
>> For huge dense data, the only viable option is SGDClassifier on memory
>> mapped arrays (double precision).
> The full content of the above paragraphs should be pasted in the docs
> (with a little bit of rewording).
>
+1


------------------------------------------------------------------------------
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Memory consumption of LinearSVC.fit

Reply via email to