2012/11/6 Mathieu Blondel <[email protected]>:
> On Tue, Nov 6, 2012 at 9:33 AM, Abhi <[email protected]> wrote:
>>
>> Hello,
>>    I have been reading and testing examples around the sklearn
>> documentation and
>>  am not too clear on few things and would appreciate any help regarding
>> the
>>  following questions:
>> 1) What would be the advantage of training LogisticRegression vs
>> OneVsRestClassifier(LogisticRegression()) for multiclass. (I understand
>> the latter would basically train n_classes classifiers).
>
>
> They actually do the same. liblinear uses one-vs-rest everywhere except for
> the crammer-singer SVM formulation.
> I wonder why we keep getting this question.

Indeed Abhi which section specific section of the documentation (or
docstring) led you to ask this question?

The note on this page is pretty explicit:

http://scikit-learn.org/dev/modules/multiclass.html

Along with the docstring:

http://scikit-learn.org/dev/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression

Maybe the docstring could be made more consistent and use the
one-vs-rest notation instance of one-vs-all (which is a synonym).

>> 2) Isnt SGDClassifier(loss='log') better than LogisticRegression for large
>> sparse datasets? If so, why?
>
> It's faster to train *once* you chose the learning rate, which is usually a
> pain. You can also try LogisticRegression(tol=1e-2) or
> LogisticRegression(tol=1e-1).

Actually the default learning rate schedule of scikit-learn kind of
always work but you have to adjust `n_iter` which is an additional
parameter w.r.t. LogisticRegression.

Also SGDClassifier can spare a dataset memory copy if your data is can
be natively loaded as a scipy Compressed Sparse Rows matrix. Also if
the data does not fit in memory you can load it as CSR chunks (e.g.
from a set of svmlight files on the filesystem or database or
vectorized on the fly from text content using a pre-fitted text
vectorizer) and the model can be incrementally learned using
sequential calls to the partial_fit method.

--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to