Hi Andy, Thanks for your response. I must read those documents.
There are lots of things I have to learn about Naive Bayes. Toyoharu 2014-06-22 6:15 GMT+09:00 Andrew Palumbo <[email protected]>: > Hi Toyoharu, > > Mahout Naive Bayes uses Laplace smoothing (the alpha_I parameter with > default=1) to deal with terms unseen by the training set. See Rennie et al. > sec. 2.3 [1]. > > Your modification will certainly work, and may in fact give better results > for the problem that your working on. > > You could also look at optimizing the Laplacian [2]. > > [1] http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf > [2] http://www.stat.yale.edu/~lc436/papers/temp/Zhang_Oles_2001.pdf > > > Andy > > > Date: Sun, 22 Jun 2014 00:41:51 +0900 > > Subject: Naive Bayes Classifier Bug ? > > From: [email protected] > > To: [email protected] > > > > Hi Mahout, > > > > In Naive Bayes, I think that a term does not exist in a training data > > should not affect a score. > > What do you think? > > > > org.apache.mahout.classifier. > > naivebayes.AbstractNaiveBayesClassifier > > > > Before: > > protected double getScoreForLabelInstance(int label, Vector instance) { > > double result = 0.0; > > for (Element e : instance.nonZeroes()) { > > result += e.get() * getScoreForLabelFeature(label, e.index()); > > } > > return result; > > } > > > > After: > > protected double getScoreForLabelInstance(int label, Vector instance) { > > double result = 0.0; > > for (Element e : instance.nonZeroes()) { > > int index = e.index(); > > double featureWeight = model.featureWeight(index); > > if( featureLabelWeight != 0 ) { > > result += e.get() * getScoreForLabelFeature(label, index); > > } > > } > > return result; > > } > > > > Thanks, > > Toyoharu > >
