2012/2/2 Gael Varoquaux <[email protected]>: > On Thu, Feb 02, 2012 at 12:45:04AM -0800, adnan rajper wrote: >> I tried "parameter tuning using grid search", but it gets too slow. Both >> classifiers (multinomial and LinearSVC) give 75% accuracy. My problem is >> that I want to improve the accuracy, for instance I want to make it more >> than 80%. Is there anyway to do it through scikit. > > Did you normalize your features?
In the tutorial TFIDF normalization is automatically used when dealing extracting the features so that should be fine. Adnan, you should try to use linear_model.Perceptron (on master only), naive_bayes.MultinomialNB or linear_model.SGDClassifier instead of the LinearSVC model. They should be faster to train and hence allow you to perform a finer grid search on their parameters (read the documentation and examples to understand how their parameters work for each of them). In your case I would try to extract bigrams and use the elasticnet penalty of SGDClassifier and do grid search on alpha (and maybe rho too). Then if you can't still reach 80% I would advise you to try to find more training data. That's probably the easiest way to improve you classification accuracy. If you have more negative than positive examples you can also try to set class_weight="auto" for classifiers that support it. Also you should have a look at the text of some badly classified samples to gain some insight on why the classifier is failing on those example. That can tell you what kind of manually extracted features would be beneficial to add to your feature extraction layer. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
