Hello, I have been reading and testing examples around the sklearn documentation and am not too clear on few things and would appreciate any help regarding the following questions: 1) What would be the advantage of training LogisticRegression vs OneVsRestClassifier(LogisticRegression()) for multiclass. (I understand the latter would basically train n_classes classifiers). 2) Isnt SGDClassifier(loss='log') better than LogisticRegression for large sparse datasets? If so, why? 3) If I need predict_proba for just the best class match from the multiclass classifier, can I use OneVsRestClassifier(SGDClassifier())
I tested on empirical data, but have approximately similar results with LogisticRegression, SGDClassifier and LinearSVC. (For now data.shape = (10000, 400000) ). However in future the data might scale to large number of training set and features, so wanted to get clearer idea on which approach to choose. Thanks, A ------------------------------------------------------------------------------ LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
