Hi All,
I use a very simple input file as the bayes input (and I tried 20newspaper
example, it will get same result):
------
mahout Mahout's goal is to build scalable machine learning libraries. With
scalable we mean: Scalable to reasonably large data sets. Our core algorithms
for clustering, classfication and batch based collaborative filtering are
implemented on top of Apache Hadoop using the map/reduce paradigm. However we
do not restrict contributions to Hadoop based implementations: Contributions
that run on
lucene All deprecations targeted to be removed in version 3.0 were removed. If
you are upgrading from version 2.9.1 of Lucene, you have to fix all deprecation
warnings in your code base to be able to recompile against this version. This
is the first Lucene
spamassasin SpamAssassin is a mail filter to identify spam. It is an
intelligent email filter which uses a diverse range of tests to identify
unsolicited bulk email, more commonly known as Spam. These tests are applied to
email headers and content to classify email using advanced statistical methods.
In addition,
------
And I put the input to a directory named bayes-input, and run the commandline:
bin/mahout trainclassifier -i bayes-input -o bayes-model --classifierType
bayes -ng 1 -source hdfs
----
After finished training, in bayes-model path, all files' size == 0
bin/hadoop fs -ls bayes-model
Found 5 items
-rw-r--r-- 3 hadoop supergroup 0 2011-10-17 10:16
/user/hadoop/bayes-model/_SUCCESS
drwxrwxrwx - hadoop supergroup 0 2011-10-17 10:16
/user/hadoop/bayes-model/_logs
drwxrwxrwx - hadoop supergroup 0 2011-10-17 10:19
/user/hadoop/bayes-model/trainer-tfIdf
drwxrwxrwx - hadoop supergroup 0 2011-10-17 10:19
/user/hadoop/bayes-model/trainer-thetaNormalizer
drwxrwxrwx - hadoop supergroup 0 2011-10-17 10:18
/user/hadoop/bayes-model/trainer-weights
----
And I use this model to classify new data, all sample will be classified to
"unknown"
My Environment:
1. Os : cent-os 5
2. Mahout : 0.5
3. Hadoop : 0.20.205
Thanks,
Wangda