Hi Grant, Thanks for your reply, attachment is log from Mahout. And I meet another problem, when I run this command in pseudo mode, it will hung when mapper finished before reducer start at 1st job for a very long time (about 10+ min or more), it's a very small train-set (with 12 samples, 4 classes). And I found some problem when people using decision forest, and get a EOF exception, it caused by "_SUCCESS" file created by map-reduce, I'm afraid is this causes the problem above. Thanks
On 10/17/11 4:08 PM, "Grant Ingersoll" <[email protected]> wrote: >Hi Wangda, > >Can you include the logs that were spit out by Mahout? > >On Oct 16, 2011, at 10:46 PM, <[email protected]> wrote: > >> Hi All, >> I use a very simple input file as the bayes input (and I tried >>20newspaper example, it will get same result): >> ------ >> mahout Mahout's goal is to build scalable machine learning libraries. >>With scalable we mean: Scalable to reasonably large data sets. Our core >>algorithms for clustering, classfication and batch based collaborative >>filtering are implemented on top of Apache Hadoop using the map/reduce >>paradigm. However we do not restrict contributions to Hadoop based >>implementations: Contributions that run on >> lucene All deprecations targeted to be removed in version 3.0 were >>removed. If you are upgrading from version 2.9.1 of Lucene, you have to >>fix all deprecation warnings in your code base to be able to recompile >>against this version. This is the first Lucene >> spamassasin SpamAssassin is a mail filter to identify spam. It is an >>intelligent email filter which uses a diverse range of tests to identify >>unsolicited bulk email, more commonly known as Spam. These tests are >>applied to email headers and content to classify email using advanced >>statistical methods. In addition, >> ------ >> >> And I put the input to a directory named bayes-input, and run the >>commandline: >> bin/mahout trainclassifier -i bayes-input -o bayes-model >>--classifierType bayes -ng 1 -source hdfs >> ---- >> After finished training, in bayes-model path, all files' size == 0 >> >> bin/hadoop fs -ls bayes-model >> Found 5 items >> -rw-r--r-- 3 hadoop supergroup 0 2011-10-17 10:16 >>/user/hadoop/bayes-model/_SUCCESS >> drwxrwxrwx - hadoop supergroup 0 2011-10-17 10:16 >>/user/hadoop/bayes-model/_logs >> drwxrwxrwx - hadoop supergroup 0 2011-10-17 10:19 >>/user/hadoop/bayes-model/trainer-tfIdf >> drwxrwxrwx - hadoop supergroup 0 2011-10-17 10:19 >>/user/hadoop/bayes-model/trainer-thetaNormalizer >> drwxrwxrwx - hadoop supergroup 0 2011-10-17 10:18 >>/user/hadoop/bayes-model/trainer-weights >> ---- >> And I use this model to classify new data, all sample will be >>classified to "unknown" >> >> My Environment: >> >> 1. Os : cent-os 5 >> 2. Mahout : 0.5 >> 3. Hadoop : 0.20.205 >> >> Thanks, >> Wangda >> > >-------------------------------------------- >Grant Ingersoll >http://www.lucidimagination.com >Lucene Eurocon 2011: http://www.lucene-eurocon.com >
