Hi,

I have a data file that I formatted in the same manner as the
20newsgroups example I have seen. A snippet of my fake data file
(key\tword1 word2 word3... \n)

spam    you need some viagra medication my friend
nonspam hi ryan my name is cassie and I am in your class
spam    aviator sunglasses with your name on them
nonspam hello ryan can you do me a favor
spam    free infertility medication here

I am trying to train and test the CBayes classifier. When I test the
classifier, I get the following non-sense output:

INFO: =======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :          0             �%
Incorrectly Classified Instances        :          0             �%
Total Classified Instances              :          0

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       <--Classified as
0       0        |  0           a     = spam
0       0        |  0           b     = nonspam
Default Category: unknown: 2


[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1 second
[INFO] Finished at: Mon Oct 04 18:13:51 PDT 2010
[INFO] Final Memory: 26M/360M
[INFO] ------------------------------------------------------------------------

I am using the following commands from the wiki to run the jobs:

mvn -e exec:java \
      -Dexec.mainClass=org.apache.mahout.classifier.bayes.TrainClassifier \
      -Dexec.args="-i simple_spam \
                   -o spam-model \
                   -type cbayes \
                   -ng 1 \
                   -source hdfs"

mvn -e exec:java \
      -Dexec.mainClass=org.apache.mahout.classifier.bayes.TestClassifier \
      -Dexec.args="-m spam-model \
                   -d simple_spam \
                   -type cbayes \
                   -ng 1 \
                   -source hdfs \
                   -method sequential"

What might I be doing wrong? Let me know if you need more information.

Thanks,
Ryan

-- 
RRR

Reply via email to