>
>
> But when I pass the other  parameters like  -type bayes -ng 3 -source hdfs
>
> The train options and test options has to match. You cannot train in bayes
mode and test in cbayes mode


> I am not getting the expected results.
> Can any one please explain me the reason behind it.
>
> Thanks
> Regards,
> Divya
>
>
> -----Original Message-----
> From: Divya [mailto:[email protected]]
> Sent: Tuesday, November 23, 2010 1:40 PM
> To: '[email protected]'
> Subject: RE: classification example doubts
>
> I am following same steps
> But no success...
>
> -----Original Message-----
> From: Sreejith S [mailto:[email protected]]
> Sent: Friday, November 19, 2010 4:00 PM
> To: [email protected]
> Subject: Re: classification example doubts
>
> step 1 : U can provide ur own sample data set using the prepare20news
> example
>  just provide ur input dir.This is to perform some normalization on each
> file.This is a must
>
> stpe2 : Train the classifier with the normalized list of files.
> u get a model dir which contains the trained data set in hdfs.
>
> step3 : Test the classifier
> By using the trained model and sample input u can test the classifier
>
> Regards
> Sreejith
>
>
> On Fri, Nov 19, 2010 at 1:15 PM, Divya <[email protected]> wrote:
>
> > for my first question u say we can put our own input documents in
> directory
> > that documents also should be of format similar to  bayes-train-input.
> > If yes, then I generated my input data using PrepareTwentyNewsgroups.
> > And used that as my input for testclassifier
> > But didn't get expected results.
> > As I observed it didn't read my files I my input directory
> > I tried replacing one of the files of input directory with one of the
> files
> > of train-input directory
> > Still same result.
> > Why is it not reading my files?
> >
> > Results below :
> >
> > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore:
> > comp.sys.mac.hardware -121323.6282757108 547567.2698760114
> > -0.2215684445551005
> > 2
> > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: sci.space
> > -189203.04544769705 547567.2698760114 -0.3455338838834164
> > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: rec.motorcycles
> > -138625.2628242977 547567.2698760114 -0.25316572127418674
> > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: rec.autos
> > -136935.18434679657 547567.2698760114 -0.25007919917821886
> > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: comp.graphics
> > -161979.38306986375 547567.2698760114 -0.29581640828631267
> > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore:
> talk.politics.misc
> > -159579.70032298338 547567.2698760114 -0.29143396455949216
> > 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: sci.med
> > -183835.5334355675 547567.2698760114 -0.3357314133790253
> > 10/11/19 10:45:12 INFO bayes.TestClassifier:
> > =======================================================
> > Summary
> > -------------------------------------------------------
> > Correctly Classified Instances          :          0             ?%
> > Incorrectly Classified Instances        :          0             ?%
> > Total Classified Instances              :          0
> >
> > =======================================================
> > Confusion Matrix
> > -------------------------------------------------------
> > a       b       c       d       e       f       g       h       i       j
> > k       l       m       n       o       p       q     r
> >        s       t       <--Classified as
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           a     = rec.sport.baseball
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           b     = sci.crypt
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           c     = rec.sport.hockey
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           d     = talk.politics.guns
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           e     = soc.religion.christian
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           f     = sci.electronics
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           g     = comp.os.ms-windows.misc
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           h     = misc.forsale
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           i     = talk.religion.misc
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           j     = alt.atheism
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           k     = comp.windows.x
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           l     = talk.politics.mideast
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           m     = comp.sys.ibm.pc.hardware
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           n     = comp.sys.mac.hardware
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           o     = sci.space
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           p     = rec.motorcycles
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           q     = rec.autos
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           r     = comp.graphics
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           s     = talk.politics.misc
> > 0       0       0       0       0       0       0       0       0       0
> > 0       0       0       0       0       0       0     0
> >        0       0        |  0           t     = sci.med
> > Default Category: unknown: 20
> >
> >
> > 10/11/19 10:45:12 INFO driver.MahoutDriver: Program took 5485 ms
> >
> > Am I missing anything .
> >
> >
> > Come to my second question, that means we are testing the classifier
> > against
> > our inputs itself.
> > Still I didn't understand.
> > What I understood about classification is we have set of documents which
> > will act as model for classification of new documents in the system.
> > Am I right?
> > Doesn't Mahout works in same way ?
> >
> > Third question, yeah I am looking for Mahout's API for classification.
> >
> >
> > @ Jaganadh - Thanks for clearing my doubts
> >
> > Regards,
> > Divya
> >
> >
> > -----Original Message-----
> > From: JAGANADH G [mailto:[email protected]]
> > Sent: Friday, November 19, 2010 3:09 PM
> > To: [email protected]
> > Subject: Re: classification example doubts
> >
> > >
> > > 1)      I want to  know what should go in "bayes-test-input".
> > >
> > >
> > After preparing the 20news-group data for training you can separate some
> > documents for testing your classifier.
> > These documents should go to "bayes-test-input".
> >
> > Or ven you can put a new set of documets in the directory .
> >
> >
> > > 2)      If we take Wikipedia example
> > > https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html
> > >
> > >
> > >
> > > To  trainclassifier We have used Wikipediainput to generate model .
> > >
> > > To test classifier again we used wikipediamodel as input and Wikipedia
> > > input
> > > as test documents directory.
> > >
> > > I didn't understand why are we doing so ?
> > >
> > >
> >
> > We are testing the classifier against the development set we used.
> >
> >
> >
> > > 3)      Last thing I want to know that when we use run testclassifier
> > using
> > > command line we can see the output.
> > >
> > > How can we make use of this output?
> > >
> >
> >
> > Are you looking for Mahout API usgae for classification ?
> >
> > --
> > **********************************
> > JAGANADH G
> > http://jaganadhg.freeflux.net/blog
> >
> >
>
>

Reply via email to