Robin,

This suggests that when the NB models are serialized using a simpler API
that some of these parameters could be carried along with the model.  At the
least, the model type should be there.  The ngrams feature may not be
appropriate since that affects input encoding, but if we had a general way
of expressing how input encoding should be done, it would be worth checking
against whether the model was happy with that.  To do this would require
that we extract a signature of the encoding method during encoding and
provide that signature for checking to the model.  That occurs a bit with
the SGD models in that they give up if the feature vector is the wrong size,
but that is a pretty weak check.

On Tue, Nov 23, 2010 at 2:58 AM, Robin Anil <[email protected]> wrote:

> >
> >
> > But when I pass the other  parameters like  -type bayes -ng 3 -source
> hdfs
> >
> > The train options and test options has to match. You cannot train in
> bayes
> mode and test in cbayes mode
>
>
> > I am not getting the expected results.
> > Can any one please explain me the reason behind it.
> >
> > Thanks
> > Regards,
> > Divya
> >
> >
> > -----Original Message-----
> > From: Divya [mailto:[email protected]]
> > Sent: Tuesday, November 23, 2010 1:40 PM
> > To: '[email protected]'
> > Subject: RE: classification example doubts
> >
> > I am following same steps
> > But no success...
> >
> > -----Original Message-----
> > From: Sreejith S [mailto:[email protected]]
> > Sent: Friday, November 19, 2010 4:00 PM
> > To: [email protected]
> > Subject: Re: classification example doubts
> >
> > step 1 : U can provide ur own sample data set using the prepare20news
> > example
> >  just provide ur input dir.This is to perform some normalization on each
> > file.This is a must
> >
> > stpe2 : Train the classifier with the normalized list of files.
> > u get a model dir which contains the trained data set in hdfs.
> >
> > step3 : Test the classifier
> > By using the trained model and sample input u can test the classifier
> >
> > Regards
> > Sreejith
> >
> >
> > On Fri, Nov 19, 2010 at 1:15 PM, Divya <[email protected]>
> wrote:
> >
> > > for my first question u say we can put our own input documents in
> > directory
> > > that documents also should be of format similar to  bayes-train-input.
> > > If yes, then I generated my input data using PrepareTwentyNewsgroups.
> > > And used that as my input for testclassifier
> > > But didn't get expected results.
> > > As I observed it didn't read my files I my input directory
> > > I tried replacing one of the files of input directory with one of the
> > files
> > > of train-input directory
> > > Still same result.
> > > Why is it not reading my files?
> > >
> > > Results below :
> > >
> > > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore:
> > > comp.sys.mac.hardware -121323.6282757108 547567.2698760114
> > > -0.2215684445551005
> > > 2
> > > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: sci.space
> > > -189203.04544769705 547567.2698760114 -0.3455338838834164
> > > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore:
> rec.motorcycles
> > > -138625.2628242977 547567.2698760114 -0.25316572127418674
> > > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: rec.autos
> > > -136935.18434679657 547567.2698760114 -0.25007919917821886
> > > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: comp.graphics
> > > -161979.38306986375 547567.2698760114 -0.29581640828631267
> > > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore:
> > talk.politics.misc
> > > -159579.70032298338 547567.2698760114 -0.29143396455949216
> > > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: sci.med
> > > -183835.5334355675 547567.2698760114 -0.3357314133790253
> > > 10/11/19 10:45:12 INFO bayes.TestClassifier:
> > > =======================================================
> > > Summary
> > > -------------------------------------------------------
> > > Correctly Classified Instances          :          0             ?%
> > > Incorrectly Classified Instances        :          0             ?%
> > > Total Classified Instances              :          0
> > >
> > > =======================================================
> > > Confusion Matrix
> > > -------------------------------------------------------
> > > a       b       c       d       e       f       g       h       i
> j
> > > k       l       m       n       o       p       q     r
> > >        s       t       <--Classified as
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           a     = rec.sport.baseball
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           b     = sci.crypt
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           c     = rec.sport.hockey
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           d     = talk.politics.guns
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           e     = soc.religion.christian
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           f     = sci.electronics
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           g     = comp.os.ms-windows.misc
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           h     = misc.forsale
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           i     = talk.religion.misc
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           j     = alt.atheism
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           k     = comp.windows.x
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           l     = talk.politics.mideast
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           m     = comp.sys.ibm.pc.hardware
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           n     = comp.sys.mac.hardware
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           o     = sci.space
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           p     = rec.motorcycles
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           q     = rec.autos
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           r     = comp.graphics
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           s     = talk.politics.misc
> > > 0       0       0       0       0       0       0       0       0
> 0
> > > 0       0       0       0       0       0       0     0
> > >        0       0        |  0           t     = sci.med
> > > Default Category: unknown: 20
> > >
> > >
> > > 10/11/19 10:45:12 INFO driver.MahoutDriver: Program took 5485 ms
> > >
> > > Am I missing anything .
> > >
> > >
> > > Come to my second question, that means we are testing the classifier
> > > against
> > > our inputs itself.
> > > Still I didn't understand.
> > > What I understood about classification is we have set of documents
> which
> > > will act as model for classification of new documents in the system.
> > > Am I right?
> > > Doesn't Mahout works in same way ?
> > >
> > > Third question, yeah I am looking for Mahout's API for classification.
> > >
> > >
> > > @ Jaganadh - Thanks for clearing my doubts
> > >
> > > Regards,
> > > Divya
> > >
> > >
> > > -----Original Message-----
> > > From: JAGANADH G [mailto:[email protected]]
> > > Sent: Friday, November 19, 2010 3:09 PM
> > > To: [email protected]
> > > Subject: Re: classification example doubts
> > >
> > > >
> > > > 1)      I want to  know what should go in "bayes-test-input".
> > > >
> > > >
> > > After preparing the 20news-group data for training you can separate
> some
> > > documents for testing your classifier.
> > > These documents should go to "bayes-test-input".
> > >
> > > Or ven you can put a new set of documets in the directory .
> > >
> > >
> > > > 2)      If we take Wikipedia example
> > > > https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html
> > > >
> > > >
> > > >
> > > > To  trainclassifier We have used Wikipediainput to generate model .
> > > >
> > > > To test classifier again we used wikipediamodel as input and
> Wikipedia
> > > > input
> > > > as test documents directory.
> > > >
> > > > I didn't understand why are we doing so ?
> > > >
> > > >
> > >
> > > We are testing the classifier against the development set we used.
> > >
> > >
> > >
> > > > 3)      Last thing I want to know that when we use run testclassifier
> > > using
> > > > command line we can see the output.
> > > >
> > > > How can we make use of this output?
> > > >
> > >
> > >
> > > Are you looking for Mahout API usgae for classification ?
> > >
> > > --
> > > **********************************
> > > JAGANADH G
> > > http://jaganadhg.freeflux.net/blog
> > >
> > >
> >
> >
>

Reply via email to