Hi Toyoharu, 

Mahout Naive Bayes uses Laplace smoothing (the alpha_I parameter with 
default=1) to deal with terms unseen by the training set. See Rennie et al. 
sec. 2.3 [1].  

Your modification will certainly work, and may in fact give better results for 
the problem that your working on. 

You could also look at optimizing the Laplacian [2].

[1] http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf    
[2] http://www.stat.yale.edu/~lc436/papers/temp/Zhang_Oles_2001.pdf


Andy

> Date: Sun, 22 Jun 2014 00:41:51 +0900
> Subject: Naive Bayes Classifier Bug ?
> From: [email protected]
> To: [email protected]
> 
> Hi Mahout,
> 
> In Naive Bayes, I think that a term does not exist in a training data
> should not affect a score.
> What do you think?
> 
>   org.apache.mahout.classifier.
> naivebayes.AbstractNaiveBayesClassifier
> 
>  Before:
>   protected double getScoreForLabelInstance(int label, Vector instance) {
>     double result = 0.0;
>     for (Element e : instance.nonZeroes()) {
>       result += e.get() * getScoreForLabelFeature(label, e.index());
>     }
>     return result;
>   }
> 
>  After:
>   protected double getScoreForLabelInstance(int label, Vector instance) {
>     double result = 0.0;
>     for (Element e : instance.nonZeroes()) {
>       int index = e.index();
>       double featureWeight = model.featureWeight(index);
>       if( featureLabelWeight != 0 ) {
>         result += e.get() * getScoreForLabelFeature(label, index);
>       }
>     }
>     return result;
>   }
> 
> Thanks,
> Toyoharu
                                          

Reply via email to