Hello, I'm training bayes classifier against this data (6 records):
target, words T A A A T A A A T A A A T A A B T A A B F A A B with a command: ./mahout trainclassifier -i /mnt/hgfs/C/daniel/my_fav_data/test -o model -type bayes -ng 1 -source hdfs then I test this classifier against the same data with: ./mahout testclassifier -d /mnt/hgfs/C/daniel/my_fav_data/test -m model -type bayes -ng 1 -source hdfs -method sequential -v and I'm getting classification I cannot understand. All records are classified as F, why is that?, shouldn't they be all classified as T? 12/01/18 11:07:55 INFO bayes.TestClassifier: Line Number: 0 Line(30): T A A A Expected Label: T Classified Label: F Correct: false 12/01/18 11:07:55 INFO bayes.TestClassifier: Line Number: 1 Line(30): T A A A Expected Label: T Classified Label: F Correct: false 12/01/18 11:07:55 INFO bayes.TestClassifier: Line Number: 2 Line(30): T A A A Expected Label: T Classified Label: F Correct: false 12/01/18 11:07:55 INFO bayes.TestClassifier: Line Number: 3 Line(30): T A A B Expected Label: T Classified Label: F Correct: false 12/01/18 11:07:55 INFO bayes.TestClassifier: Line Number: 4 Line(30): T A A B Expected Label: T Classified Label: F Correct: false 12/01/18 11:07:55 INFO bayes.TestClassifier: Line Number: 5 Line(30): F A A B Expected Label: F Classified Label: F Correct: true My reasoning (no smoothing applied): Prior: P(T) = 5/6 P(F) = 1/6 P(A/T) = 13/15 P(A/F) = 2/3 P(B/T) = 2/15 P(B/F) = 1/3 Then I calculate posterior probability, e.g. P(T|A,A,B) = 0.7717 - record classified as T. What is the reasoning behind classifying all records above as F? Any help much appreciated. PS. I was using mahout trunk from 16.01.2012. Regards. Daniel -- Daniel Korzekwa Software Engineer priv: http://danmachine.com blog: http://blog.danmachine.com
