On 01/23/2012 09:11 PM, Olivier Grisel wrote: > 2012/1/23 Dimitrios Pritsos<[email protected]>: >> However, when I do the same test using partial_fit() for the same >> sub-set of my Data Set (see above) I am getting ~20%. >> >> Any Suggestions? > Do a grid search to find the best alpha on SGDClassifier (and on C for > the LinearSVC classifier). For instance: > >>>> from sklearn.grid_search import GridSearchCV >>>> from sklearn.linear_model import SGDClassifier >>>> from sklearn.datasets import fetch_20newsgroups_vectorized >>>> twenty = fetch_20newsgroups_vectorized() >>>> param_grid = {'alpha': [1e-3, 1e-4, 1e-5]} >>>> gs = GridSearchCV(SGDClassifier(), param_grid).fit(twenty.data, >>>> twenty.target) >>>> gs.best_estimator_ > SGDClassifier(alpha=0.0001, class_weight=None, eta0=0.0, fit_intercept=True, > learning_rate='optimal', loss='hinge', n_iter=5, n_jobs=1, > penalty='l2', power_t=0.5, rho=0.85, seed=0, shuffle=False, > verbose=0, warm_start=False) >>>> gs.best_score_ > 0.8575220898001239 > > You can also include 'n_iter': [5, 10, 50] and 'class_weight': > ['auto', None] in the param_grid but beware of the combinatorial > explosion in computation time. > > Don't worry about partial_fit as your data will fit in memory with the > CSR format. >
Thank you very much for the advice. I will try this too(today!). however, it seems that I might need to use the partial_fit() in the near feature after I will collect/crawl a new corpus. So a question is, my result (20%) was due to some short of bug in partial_fit() or my incorrect use of this function? Best Regards, Dimitrios ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
