Re: unknown test data twenty-newsgroups example

Neil Ghosh Thu, 21 Oct 2010 09:52:16 -0700

Thanks Drew
I could not find the file

http://svn.apache.org/repos/asf/mahout/trunk/core/src/test/java/org/apache/mahout/classifier/bayes/BayesClassifierSelfTest.java


In my mahout trunk in this directory

n...@neil-laptop:~/trunk/core/src/main/java/org/apache/mahout/classifier/bayes$
ll
total 92
drwxr-xr-x 11 neil neil  4096 2010-09-19 12:15 ./
drwxr-xr-x  7 neil neil  4096 2010-09-19 12:15 ../
drwxr-xr-x  3 neil neil  4096 2010-09-19 12:15 algorithm/
drwxr-xr-x  3 neil neil  4096 2010-09-19 12:15 common/
drwxr-xr-x  3 neil neil  4096 2010-09-19 12:15 datastore/
drwxr-xr-x  3 neil neil  4096 2010-09-19 12:15 exceptions/
drwxr-xr-x  3 neil neil  4096 2010-09-19 12:15 interfaces/
drwxr-xr-x  3 neil neil  4096 2010-09-19 12:15 io/
drwxr-xr-x  6 neil neil  4096 2010-09-19 12:15 mapreduce/
drwxr-xr-x  3 neil neil  4096 2010-09-19 12:15 model/
-rw-r--r--  1 neil neil  8249 2010-09-19 12:15 MultipleOutputFormat.java
-rw-r--r--  1 neil neil  1441 2010-09-19 12:15 MultipleTextOutputFormat.java
-rw-r--r--  1 neil neil  4133 2010-09-19 12:15 package.html
drwxr-xr-x  6 neil neil  4096 2010-09-19 12:15 .svn/
-rw-r--r--  1 neil neil 13066 2010-09-19 12:15 TestClassifier.java
-rw-r--r--  1 neil neil  7660 2010-09-19 12:15 TrainClassifier.java

Am I looking at the correct directory ?
Any reference how to run this ?

On Thu, Sep 30, 2010 at 11:58 PM, Drew Farris <[email protected]> wrote:

> On Thu, Sep 30, 2010 at 10:00 AM, Neil Ghosh <[email protected]> wrote:
> >
> > My Question is , If I want to test unknown, documents , do I need it in
> > specific format ? or just keep them (as raw text ) in the input folder
> while
> > testing ?
>
> If I interpret your question correctly, you're saying "I've trained my
> classifier and tested it, now how do I use it in production?". I don't
> know that this is covered by the example.
>
> The unit test, in core/src/test/java --
> org.apache.mahout.classifier.bayes.BayesClassifierSelfTest provides a
> potentially useful example. Take a look at the testSelfTestBayes()
> method.
>
> In general, the operations involved include;
>   Create an instance of Algorithm and Datastore, configure as appropriate .
>   Create an instance of ClassifierContext (named classifier) using
> the Algorithm and Datastore, calling initialize() upon i the context.
>   Generate tokens from your input document (either individual words
> or ngrams based on how the data used to train the model was
> processed).
>   Call classifier.classifyDocument(String[] tokens, String
> defaultCat) this will return a ClassifierResult containing the top
> classifications for the input document ranked by score).
>
> HTH,
>
> Drew
>



-- 
Thanks and Regards
Neil
http://neilghosh.com

Re: unknown test data twenty-newsgroups example

Reply via email to