Hi guys,

   I was able to test this example.  But how do I use the actual classifier?
Once I train the data and have the model, I want to use the model to
categorize new set of data which is not classified.

   Is there any straight-forward way to do this with Mahout or should I be
tweaking the code?

Regards,
~Vivek

On Fri, Nov 19, 2010 at 4:35 AM, JAGANADH G <[email protected]> wrote:

> On Fri, Nov 19, 2010 at 1:15 PM, Divya <[email protected]> wrote:
>
> > for my first question u say we can put our own input documents in
> directory
> > that documents also should be of format similar to  bayes-train-input.
> > If yes, then I generated my input data using PrepareTwentyNewsgroups.
> > And used that as my input for testclassifier
> > But didn't get expected results.
> > As I observed it didn't read my files I my input directory
> > I tried replacing one of the files of input directory with one of the
> files
> > of train-input directory
> > Still same result.
> > Why is it not reading my files?
> >
> > Am I missing anything .
> >
> >
> I think some thing happened wrong with your training .
> I trained 20-news groups and tested it. My result is available at
> http://pastebin.com/kGY4LmW7 . Check it.
>
> The commad which i used for
> 1) Preparing data is
>  bin/mahout prepare20newsgroups  -p /home/jaganadhg/20news-bydate-train/ -o
> 20news -c UTF-8 -a org.apache.mahout.vectorizer.DefaultAnalyzer
> 2) to train :
> bin/mahout trainclassifier  -i 20news/ -o 20cbayesn -type cbayes -a 1.0 -ng
> 2
> 3) to test :
> bin/mahout testclassifier -m 20bayes -d 20news -type bayes -ng 2 -method
> sequential
>
> The result is available at http://pastebin.com/kGY4LmW7
>
>
> >
> > Come to my second question, that means we are testing the classifier
> > against
> > our inputs itself.
> > Still I didn't understand.
> > What I understood about classification is we have set of documents which
> > will act as model for classification of new documents in the system.
> > Am I right?
> >
>
>
> The documets are not acting as model. Mahout TrainClassifierr will create a
> model out of the documents provided for training.
> The command testclassifier takes following arguments
> 1) a directory containing model (specified after -m )
> 2) a directory which containing documents for testing the classifier.
> (specified after -d ) . Documents in this directory should be formatted
> like
> the wat we prepared document for training
> 3) type of the classifier algo . Here I used bayes (specified after -type )
> 4) Defuault category name (specified after -default) you can set it as
> "unknown"
> 4) Value of Alpha_i used in training (specified after -a ). By default it
> is
> 1.0
> 5) Source of model dir (specified after -source). You can set it as hdfs
> 6) Ngram sixe (specified after -ng) . The ngram size should be same as you
> used in training
>
> A sample command with all these parameters are shown below
> bin/mahout testclassifier -d movie -m movie-model/ -type bayes  -default
> unknown -a 1.0 -method sequential -source hdfs -e UTF-8 -ng 1
>
>
> > Doesn't Mahout works in same way ?
> >
> > Third question, yeah I am looking for Mahout's API for classification.
> >
>
> A sample program is given below
>
>
> http://bitbucket.org/jaganadhg/blog/src/995fa52d4fbc/bck9/java/src/org/bc/kl/ClassifierDemo.java
>
> For working it in real-time system you have to some more work . Find it :-)
>
> --
> **********************************
> JAGANADH G
> http://jaganadhg.freeflux.net/blog
>

Reply via email to