Hi Wangda, Can you include the logs that were spit out by Mahout?
On Oct 16, 2011, at 10:46 PM, <[email protected]> wrote: > Hi All, > I use a very simple input file as the bayes input (and I tried 20newspaper > example, it will get same result): > ------ > mahout Mahout's goal is to build scalable machine learning libraries. With > scalable we mean: Scalable to reasonably large data sets. Our core algorithms > for clustering, classfication and batch based collaborative filtering are > implemented on top of Apache Hadoop using the map/reduce paradigm. However we > do not restrict contributions to Hadoop based implementations: Contributions > that run on > lucene All deprecations targeted to be removed in version 3.0 were removed. > If you are upgrading from version 2.9.1 of Lucene, you have to fix all > deprecation warnings in your code base to be able to recompile against this > version. This is the first Lucene > spamassasin SpamAssassin is a mail filter to identify spam. It is an > intelligent email filter which uses a diverse range of tests to identify > unsolicited bulk email, more commonly known as Spam. These tests are applied > to email headers and content to classify email using advanced statistical > methods. In addition, > ------ > > And I put the input to a directory named bayes-input, and run the commandline: > bin/mahout trainclassifier -i bayes-input -o bayes-model --classifierType > bayes -ng 1 -source hdfs > ---- > After finished training, in bayes-model path, all files' size == 0 > > bin/hadoop fs -ls bayes-model > Found 5 items > -rw-r--r-- 3 hadoop supergroup 0 2011-10-17 10:16 > /user/hadoop/bayes-model/_SUCCESS > drwxrwxrwx - hadoop supergroup 0 2011-10-17 10:16 > /user/hadoop/bayes-model/_logs > drwxrwxrwx - hadoop supergroup 0 2011-10-17 10:19 > /user/hadoop/bayes-model/trainer-tfIdf > drwxrwxrwx - hadoop supergroup 0 2011-10-17 10:19 > /user/hadoop/bayes-model/trainer-thetaNormalizer > drwxrwxrwx - hadoop supergroup 0 2011-10-17 10:18 > /user/hadoop/bayes-model/trainer-weights > ---- > And I use this model to classify new data, all sample will be classified to > "unknown" > > My Environment: > > 1. Os : cent-os 5 > 2. Mahout : 0.5 > 3. Hadoop : 0.20.205 > > Thanks, > Wangda > -------------------------------------------- Grant Ingersoll http://www.lucidimagination.com Lucene Eurocon 2011: http://www.lucene-eurocon.com
