2012/2/2 Gael Varoquaux <[email protected]>:
> On Thu, Feb 02, 2012 at 12:45:04AM -0800, adnan rajper wrote:
>>    I tried "parameter tuning using grid search",  but it gets too slow. Both
>>    classifiers (multinomial and LinearSVC) give 75% accuracy. My problem is
>>    that I want to improve the accuracy, for instance I want to make it more
>>    than 80%. Is there anyway to do it through scikit.
>
> Did you normalize your features?

In the tutorial TFIDF normalization is automatically used when dealing
extracting the features so that should be fine.

Adnan, you should try to use linear_model.Perceptron (on master only),
naive_bayes.MultinomialNB or linear_model.SGDClassifier instead of the
LinearSVC model. They should be faster to train and hence allow you to
perform a finer grid search on their parameters (read the
documentation and examples to understand how their parameters work for
each of them).

In your case I would try to extract bigrams and use the elasticnet
penalty of SGDClassifier and do grid search on alpha (and maybe rho
too).

Then if you can't still reach 80% I would advise you to try to find
more training data. That's probably the easiest way to improve you
classification accuracy.

If you have more negative than positive examples you can also try to
set class_weight="auto" for classifiers that support it.

Also you should have a look at the text of some badly classified
samples to gain some insight on why the classifier is failing on those
example. That can tell you what kind of manually extracted features
would be beneficial to add to your feature extraction layer.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to