Hi Andy,

Thanks for your response. I must read those documents.

There are lots of things I have to learn about Naive Bayes.

Toyoharu



2014-06-22 6:15 GMT+09:00 Andrew Palumbo <[email protected]>:

> Hi Toyoharu,
>
> Mahout Naive Bayes uses Laplace smoothing (the alpha_I parameter with
> default=1) to deal with terms unseen by the training set. See Rennie et al.
> sec. 2.3 [1].
>
> Your modification will certainly work, and may in fact give better results
> for the problem that your working on.
>
> You could also look at optimizing the Laplacian [2].
>
> [1] http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf
> [2] http://www.stat.yale.edu/~lc436/papers/temp/Zhang_Oles_2001.pdf
>
>
> Andy
>
> > Date: Sun, 22 Jun 2014 00:41:51 +0900
> > Subject: Naive Bayes Classifier Bug ?
> > From: [email protected]
> > To: [email protected]
> >
> > Hi Mahout,
> >
> > In Naive Bayes, I think that a term does not exist in a training data
> > should not affect a score.
> > What do you think?
> >
> >   org.apache.mahout.classifier.
> > naivebayes.AbstractNaiveBayesClassifier
> >
> >  Before:
> >   protected double getScoreForLabelInstance(int label, Vector instance) {
> >     double result = 0.0;
> >     for (Element e : instance.nonZeroes()) {
> >       result += e.get() * getScoreForLabelFeature(label, e.index());
> >     }
> >     return result;
> >   }
> >
> >  After:
> >   protected double getScoreForLabelInstance(int label, Vector instance) {
> >     double result = 0.0;
> >     for (Element e : instance.nonZeroes()) {
> >       int index = e.index();
> >       double featureWeight = model.featureWeight(index);
> >       if( featureLabelWeight != 0 ) {
> >         result += e.get() * getScoreForLabelFeature(label, index);
> >       }
> >     }
> >     return result;
> >   }
> >
> > Thanks,
> > Toyoharu
>
>

Reply via email to