Hi All,
I use a very simple input file as the bayes input (and I tried 20newspaper 
example, it will get same result):
------
mahout  Mahout's goal is to build scalable machine learning libraries. With 
scalable we mean: Scalable to reasonably large data sets. Our core algorithms 
for clustering, classfication and batch based collaborative filtering are 
implemented on top of Apache Hadoop using the map/reduce paradigm. However we 
do not restrict contributions to Hadoop based implementations: Contributions 
that run on
lucene  All deprecations targeted to be removed in version 3.0 were removed. If 
you are upgrading from version 2.9.1 of Lucene, you have to fix all deprecation 
warnings in your code base to be able to recompile against this version. This 
is the first Lucene
spamassasin SpamAssassin is a mail filter to identify spam. It is an 
intelligent email filter which uses a diverse range of tests to identify 
unsolicited bulk email, more commonly known as Spam. These tests are applied to 
email headers and content to classify email using advanced statistical methods. 
In addition,
------

And I put the input to a directory named bayes-input, and run the commandline:
    bin/mahout trainclassifier -i bayes-input -o bayes-model --classifierType 
bayes -ng 1 -source hdfs
----
After finished training, in bayes-model path, all files' size == 0

bin/hadoop fs -ls bayes-model
Found 5 items
-rw-r--r--   3 hadoop supergroup          0 2011-10-17 10:16 
/user/hadoop/bayes-model/_SUCCESS
drwxrwxrwx   - hadoop supergroup          0 2011-10-17 10:16 
/user/hadoop/bayes-model/_logs
drwxrwxrwx   - hadoop supergroup          0 2011-10-17 10:19 
/user/hadoop/bayes-model/trainer-tfIdf
drwxrwxrwx   - hadoop supergroup          0 2011-10-17 10:19 
/user/hadoop/bayes-model/trainer-thetaNormalizer
drwxrwxrwx   - hadoop supergroup          0 2011-10-17 10:18 
/user/hadoop/bayes-model/trainer-weights
----
And I use this model to classify new data, all sample will be classified to 
"unknown"

My Environment:

 1.  Os     : cent-os 5
 2.  Mahout : 0.5
 3.  Hadoop : 0.20.205

Thanks,
Wangda

Reply via email to