Hi,
I have a data file that I formatted in the same manner as the
20newsgroups example I have seen. A snippet of my fake data file
(key\tword1 word2 word3... \n)
spam you need some viagra medication my friend
nonspam hi ryan my name is cassie and I am in your class
spam aviator sunglasses with your name on them
nonspam hello ryan can you do me a favor
spam free infertility medication here
I am trying to train and test the CBayes classifier. When I test the
classifier, I get the following non-sense output:
INFO: =======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 0 �%
Incorrectly Classified Instances : 0 �%
Total Classified Instances : 0
=======================================================
Confusion Matrix
-------------------------------------------------------
a b <--Classified as
0 0 | 0 a = spam
0 0 | 0 b = nonspam
Default Category: unknown: 2
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1 second
[INFO] Finished at: Mon Oct 04 18:13:51 PDT 2010
[INFO] Final Memory: 26M/360M
[INFO] ------------------------------------------------------------------------
I am using the following commands from the wiki to run the jobs:
mvn -e exec:java \
-Dexec.mainClass=org.apache.mahout.classifier.bayes.TrainClassifier \
-Dexec.args="-i simple_spam \
-o spam-model \
-type cbayes \
-ng 1 \
-source hdfs"
mvn -e exec:java \
-Dexec.mainClass=org.apache.mahout.classifier.bayes.TestClassifier \
-Dexec.args="-m spam-model \
-d simple_spam \
-type cbayes \
-ng 1 \
-source hdfs \
-method sequential"
What might I be doing wrong? Let me know if you need more information.
Thanks,
Ryan
--
RRR