2012/11/6 Mathieu Blondel <[email protected]>: > On Tue, Nov 6, 2012 at 9:33 AM, Abhi <[email protected]> wrote: >> >> Hello, >> I have been reading and testing examples around the sklearn >> documentation and >> am not too clear on few things and would appreciate any help regarding >> the >> following questions: >> 1) What would be the advantage of training LogisticRegression vs >> OneVsRestClassifier(LogisticRegression()) for multiclass. (I understand >> the latter would basically train n_classes classifiers). > > > They actually do the same. liblinear uses one-vs-rest everywhere except for > the crammer-singer SVM formulation. > I wonder why we keep getting this question.
Indeed Abhi which section specific section of the documentation (or docstring) led you to ask this question? The note on this page is pretty explicit: http://scikit-learn.org/dev/modules/multiclass.html Along with the docstring: http://scikit-learn.org/dev/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression Maybe the docstring could be made more consistent and use the one-vs-rest notation instance of one-vs-all (which is a synonym). >> 2) Isnt SGDClassifier(loss='log') better than LogisticRegression for large >> sparse datasets? If so, why? > > It's faster to train *once* you chose the learning rate, which is usually a > pain. You can also try LogisticRegression(tol=1e-2) or > LogisticRegression(tol=1e-1). Actually the default learning rate schedule of scikit-learn kind of always work but you have to adjust `n_iter` which is an additional parameter w.r.t. LogisticRegression. Also SGDClassifier can spare a dataset memory copy if your data is can be natively loaded as a scipy Compressed Sparse Rows matrix. Also if the data does not fit in memory you can load it as CSR chunks (e.g. from a set of svmlight files on the filesystem or database or vectorized on the fly from text content using a pre-fitted text vectorizer) and the model can be incrementally learned using sequential calls to the partial_fit method. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
