The Encoder and The trainer configurations should be carried over. I know
the issue, I too have weak checks to see if a bayes model is being used for
cbayes classification (vice versa is actually possible, so there is no
check). Looks like there needs to be more checks.

On Tue, Nov 23, 2010 at 9:53 PM, Ted Dunning <[email protected]> wrote:

> Robin,
>
> This suggests that when the NB models are serialized using a simpler API
> that some of these parameters could be carried along with the model.  At
> the
> least, the model type should be there.  The ngrams feature may not be
> appropriate since that affects input encoding, but if we had a general way
> of expressing how input encoding should be done, it would be worth checking
> against whether the model was happy with that.  To do this would require
> that we extract a signature of the encoding method during encoding and
> provide that signature for checking to the model.  That occurs a bit with
> the SGD models in that they give up if the feature vector is the wrong
> size,
> but that is a pretty weak check.
>
> On Tue, Nov 23, 2010 at 2:58 AM, Robin Anil <[email protected]> wrote:
>
> > >
> > >
> > > But when I pass the other  parameters like  -type bayes -ng 3 -source
> > hdfs
> > >
> > > The train options and test options has to match. You cannot train in
> > bayes
> > mode and test in cbayes mode
> >
> >
> > > I am not getting the expected results.
> > > Can any one please explain me the reason behind it.
> > >
> > > Thanks
> > > Regards,
> > > Divya
> > >
> > >
> > > -----Original Message-----
> > > From: Divya [mailto:[email protected]]
> > > Sent: Tuesday, November 23, 2010 1:40 PM
> > > To: '[email protected]'
> > > Subject: RE: classification example doubts
> > >
> > > I am following same steps
> > > But no success...
> > >
> > > -----Original Message-----
> > > From: Sreejith S [mailto:[email protected]]
> > > Sent: Friday, November 19, 2010 4:00 PM
> > > To: [email protected]
> > > Subject: Re: classification example doubts
> > >
> > > step 1 : U can provide ur own sample data set using the prepare20news
> > > example
> > >  just provide ur input dir.This is to perform some normalization on
> each
> > > file.This is a must
> > >
> > > stpe2 : Train the classifier with the normalized list of files.
> > > u get a model dir which contains the trained data set in hdfs.
> > >
> > > step3 : Test the classifier
> > > By using the trained model and sample input u can test the classifier
> > >
> > > Regards
> > > Sreejith
> > >
> > >
> > > On Fri, Nov 19, 2010 at 1:15 PM, Divya <[email protected]>
> > wrote:
> > >
> > > > for my first question u say we can put our own input documents in
> > > directory
> > > > that documents also should be of format similar to
>  bayes-train-input.
> > > > If yes, then I generated my input data using PrepareTwentyNewsgroups.
> > > > And used that as my input for testclassifier
> > > > But didn't get expected results.
> > > > As I observed it didn't read my files I my input directory
> > > > I tried replacing one of the files of input directory with one of the
> > > files
> > > > of train-input directory
> > > > Still same result.
> > > > Why is it not reading my files?
> > > >
> > > > Results below :
> > > >
> > > > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore:
> > > > comp.sys.mac.hardware -121323.6282757108 547567.2698760114
> > > > -0.2215684445551005
> > > > 2
> > > > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: sci.space
> > > > -189203.04544769705 547567.2698760114 -0.3455338838834164
> > > > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore:
> > rec.motorcycles
> > > > -138625.2628242977 547567.2698760114 -0.25316572127418674
> > > > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: rec.autos
> > > > -136935.18434679657 547567.2698760114 -0.25007919917821886
> > > > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore:
> comp.graphics
> > > > -161979.38306986375 547567.2698760114 -0.29581640828631267
> > > > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore:
> > > talk.politics.misc
> > > > -159579.70032298338 547567.2698760114 -0.29143396455949216
> > > > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: sci.med
> > > > -183835.5334355675 547567.2698760114 -0.3357314133790253
> > > > 10/11/19 10:45:12 INFO bayes.TestClassifier:
> > > > =======================================================
> > > > Summary
> > > > -------------------------------------------------------
> > > > Correctly Classified Instances          :          0             ?%
> > > > Incorrectly Classified Instances        :          0             ?%
> > > > Total Classified Instances              :          0
> > > >
> > > > =======================================================
> > > > Confusion Matrix
> > > > -------------------------------------------------------
> > > > a       b       c       d       e       f       g       h       i
> > j
> > > > k       l       m       n       o       p       q     r
> > > >        s       t       <--Classified as
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           a     = rec.sport.baseball
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           b     = sci.crypt
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           c     = rec.sport.hockey
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           d     = talk.politics.guns
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           e     = soc.religion.christian
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           f     = sci.electronics
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           g     =
> comp.os.ms-windows.misc
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           h     = misc.forsale
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           i     = talk.religion.misc
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           j     = alt.atheism
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           k     = comp.windows.x
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           l     = talk.politics.mideast
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           m     =
> comp.sys.ibm.pc.hardware
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           n     = comp.sys.mac.hardware
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           o     = sci.space
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           p     = rec.motorcycles
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           q     = rec.autos
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           r     = comp.graphics
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           s     = talk.politics.misc
> > > > 0       0       0       0       0       0       0       0       0
> > 0
> > > > 0       0       0       0       0       0       0     0
> > > >        0       0        |  0           t     = sci.med
> > > > Default Category: unknown: 20
> > > >
> > > >
> > > > 10/11/19 10:45:12 INFO driver.MahoutDriver: Program took 5485 ms
> > > >
> > > > Am I missing anything .
> > > >
> > > >
> > > > Come to my second question, that means we are testing the classifier
> > > > against
> > > > our inputs itself.
> > > > Still I didn't understand.
> > > > What I understood about classification is we have set of documents
> > which
> > > > will act as model for classification of new documents in the system.
> > > > Am I right?
> > > > Doesn't Mahout works in same way ?
> > > >
> > > > Third question, yeah I am looking for Mahout's API for
> classification.
> > > >
> > > >
> > > > @ Jaganadh - Thanks for clearing my doubts
> > > >
> > > > Regards,
> > > > Divya
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: JAGANADH G [mailto:[email protected]]
> > > > Sent: Friday, November 19, 2010 3:09 PM
> > > > To: [email protected]
> > > > Subject: Re: classification example doubts
> > > >
> > > > >
> > > > > 1)      I want to  know what should go in "bayes-test-input".
> > > > >
> > > > >
> > > > After preparing the 20news-group data for training you can separate
> > some
> > > > documents for testing your classifier.
> > > > These documents should go to "bayes-test-input".
> > > >
> > > > Or ven you can put a new set of documets in the directory .
> > > >
> > > >
> > > > > 2)      If we take Wikipedia example
> > > > > https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html
> > > > >
> > > > >
> > > > >
> > > > > To  trainclassifier We have used Wikipediainput to generate model .
> > > > >
> > > > > To test classifier again we used wikipediamodel as input and
> > Wikipedia
> > > > > input
> > > > > as test documents directory.
> > > > >
> > > > > I didn't understand why are we doing so ?
> > > > >
> > > > >
> > > >
> > > > We are testing the classifier against the development set we used.
> > > >
> > > >
> > > >
> > > > > 3)      Last thing I want to know that when we use run
> testclassifier
> > > > using
> > > > > command line we can see the output.
> > > > >
> > > > > How can we make use of this output?
> > > > >
> > > >
> > > >
> > > > Are you looking for Mahout API usgae for classification ?
> > > >
> > > > --
> > > > **********************************
> > > > JAGANADH G
> > > > http://jaganadhg.freeflux.net/blog
> > > >
> > > >
> > >
> > >
> >
>

Reply via email to