On Thu, Sep 30, 2010 at 10:00 AM, Neil Ghosh <[email protected]> wrote: > > My Question is , If I want to test unknown, documents , do I need it in > specific format ? or just keep them (as raw text ) in the input folder while > testing ?
If I interpret your question correctly, you're saying "I've trained my classifier and tested it, now how do I use it in production?". I don't know that this is covered by the example. The unit test, in core/src/test/java -- org.apache.mahout.classifier.bayes.BayesClassifierSelfTest provides a potentially useful example. Take a look at the testSelfTestBayes() method. In general, the operations involved include; Create an instance of Algorithm and Datastore, configure as appropriate . Create an instance of ClassifierContext (named classifier) using the Algorithm and Datastore, calling initialize() upon i the context. Generate tokens from your input document (either individual words or ngrams based on how the data used to train the model was processed). Call classifier.classifyDocument(String[] tokens, String defaultCat) this will return a ClassifierResult containing the top classifications for the input document ranked by score). HTH, Drew
