Thanks Drew I could not find the file http://svn.apache.org/repos/asf/mahout/trunk/core/src/test/java/org/apache/mahout/classifier/bayes/BayesClassifierSelfTest.java
In my mahout trunk in this directory n...@neil-laptop:~/trunk/core/src/main/java/org/apache/mahout/classifier/bayes$ ll total 92 drwxr-xr-x 11 neil neil 4096 2010-09-19 12:15 ./ drwxr-xr-x 7 neil neil 4096 2010-09-19 12:15 ../ drwxr-xr-x 3 neil neil 4096 2010-09-19 12:15 algorithm/ drwxr-xr-x 3 neil neil 4096 2010-09-19 12:15 common/ drwxr-xr-x 3 neil neil 4096 2010-09-19 12:15 datastore/ drwxr-xr-x 3 neil neil 4096 2010-09-19 12:15 exceptions/ drwxr-xr-x 3 neil neil 4096 2010-09-19 12:15 interfaces/ drwxr-xr-x 3 neil neil 4096 2010-09-19 12:15 io/ drwxr-xr-x 6 neil neil 4096 2010-09-19 12:15 mapreduce/ drwxr-xr-x 3 neil neil 4096 2010-09-19 12:15 model/ -rw-r--r-- 1 neil neil 8249 2010-09-19 12:15 MultipleOutputFormat.java -rw-r--r-- 1 neil neil 1441 2010-09-19 12:15 MultipleTextOutputFormat.java -rw-r--r-- 1 neil neil 4133 2010-09-19 12:15 package.html drwxr-xr-x 6 neil neil 4096 2010-09-19 12:15 .svn/ -rw-r--r-- 1 neil neil 13066 2010-09-19 12:15 TestClassifier.java -rw-r--r-- 1 neil neil 7660 2010-09-19 12:15 TrainClassifier.java Am I looking at the correct directory ? Any reference how to run this ? On Thu, Sep 30, 2010 at 11:58 PM, Drew Farris <[email protected]> wrote: > On Thu, Sep 30, 2010 at 10:00 AM, Neil Ghosh <[email protected]> wrote: > > > > My Question is , If I want to test unknown, documents , do I need it in > > specific format ? or just keep them (as raw text ) in the input folder > while > > testing ? > > If I interpret your question correctly, you're saying "I've trained my > classifier and tested it, now how do I use it in production?". I don't > know that this is covered by the example. > > The unit test, in core/src/test/java -- > org.apache.mahout.classifier.bayes.BayesClassifierSelfTest provides a > potentially useful example. Take a look at the testSelfTestBayes() > method. > > In general, the operations involved include; > Create an instance of Algorithm and Datastore, configure as appropriate . > Create an instance of ClassifierContext (named classifier) using > the Algorithm and Datastore, calling initialize() upon i the context. > Generate tokens from your input document (either individual words > or ngrams based on how the data used to train the model was > processed). > Call classifier.classifyDocument(String[] tokens, String > defaultCat) this will return a ClassifierResult containing the top > classifications for the input document ranked by score). > > HTH, > > Drew > -- Thanks and Regards Neil http://neilghosh.com
