Hi Mahout,
In Naive Bayes, I think that a term does not exist in a training data
should not affect a score.
What do you think?
org.apache.mahout.classifier.
naivebayes.AbstractNaiveBayesClassifier
Before:
protected double getScoreForLabelInstance(int label, Vector instance) {
double result = 0.0;
for (Element e : instance.nonZeroes()) {
result += e.get() * getScoreForLabelFeature(label, e.index());
}
return result;
}
After:
protected double getScoreForLabelInstance(int label, Vector instance) {
double result = 0.0;
for (Element e : instance.nonZeroes()) {
int index = e.index();
double featureWeight = model.featureWeight(index);
if( featureLabelWeight != 0 ) {
result += e.get() * getScoreForLabelFeature(label, index);
}
}
return result;
}
Thanks,
Toyoharu