Hi there.. I'm not sure if you have been answered yet.. so perhaps I can
help

MultinomialNB has a parameter called `class_weight` which you can set at
initialization.

 |  class_weight : array-like, size=[n_classes,]
 |      Prior probabilities of the classes. If specified the priors are not
 |      adjusted according to the data.

So I believe you should be able to say, for example with a 0/1
classification problem:
pipeline = Pipeline([('tfidf', TfidfTransformer()),
                     ('chi2', SelectKBest(chi2, k=1000)),
                     ('nb', MultinomialNB(class_weight={0:0.9, 1:0.1}))])
so class 0 would have a weight of 0.9 and class 1 has a weight of 0.1

Hope this helps
Regards,
Jaques



2012/12/23 Jieyun Fu <jieyu...@gmail.com>

> Hi all,
>
> I found this piece of code (from 
> here<http://stackoverflow.com/questions/10098533/implementing-bag-of-words-naive-bayes-classifier-in-nltk>),
> which basically tries to classify movie reviews into positive and negative.
> Now I need to put in weights for positive and negative reviews (for
> example, negative reviews have a weight of 0.5 and positive review have a
> weight of 1). Is there a way to do it in Pipeline
> class? MultinomialNB.fit() has sample_weights parameters, but I can't "set"
> the sample_weights anywhere in Pipeline (or can I?)
>
> Sorry if this is a dumb question. I am quite new to this functionality of
> sklearn.
>
> import numpy as np
> from nltk.probability import FreqDist
> from nltk.classify import SklearnClassifier
> from sklearn.feature_extraction.text import TfidfTransformer
> from sklearn.feature_selection import SelectKBest, chi2
> from sklearn.naive_bayes import MultinomialNB
> from sklearn.pipeline import Pipeline
>
> pipeline = Pipeline([('tfidf', TfidfTransformer()),
>                      ('chi2', SelectKBest(chi2, k=1000)),
>                      ('nb', MultinomialNB())])
> classif = SklearnClassifier(pipeline)
>
> from nltk.corpus import movie_reviews
> pos = [FreqDist(movie_reviews.words(i)) for i in
> movie_reviews.fileids('pos')]
> neg = [FreqDist(movie_reviews.words(i)) for i in
> movie_reviews.fileids('neg')]
> add_label = lambda lst, lab: [(x, lab) for x in lst]
> classif.train(add_label(pos[:100], 'pos') + add_label(neg[:100], 'neg'))
>
> l_pos = np.array(classif.batch_classify(pos[100:]))
> l_neg = np.array(classif.batch_classify(neg[100:]))
> print "Confusion matrix:\n%d\t%d\n%d\t%d" % (
>           (l_pos == 'pos').sum(), (l_pos == 'neg').sum(),
>           (l_neg == 'pos').sum(), (l_neg == 'neg').sum())
>
>
>
> ------------------------------------------------------------------------------
> LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
> Remotely access PCs and mobile devices and provide instant support
> Improve your efficiency, and focus on delivering more value-add services
> Discover what IT Professionals Know. Rescue delivers
> http://p.sf.net/sfu/logmein_12329d2d
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to