for my first question u say we can put our own input documents in directory
that documents also should be of format similar to bayes-train-input.
If yes, then I generated my input data using PrepareTwentyNewsgroups.
And used that as my input for testclassifier
But didn't get expected results.
As I observed it didn't read my files I my input directory
I tried replacing one of the files of input directory with one of the files
of train-input directory
Still same result.
Why is it not reading my files?
Results below :
10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore:
comp.sys.mac.hardware -121323.6282757108 547567.2698760114
-0.2215684445551005
2
10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: sci.space
-189203.04544769705 547567.2698760114 -0.3455338838834164
10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: rec.motorcycles
-138625.2628242977 547567.2698760114 -0.25316572127418674
10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: rec.autos
-136935.18434679657 547567.2698760114 -0.25007919917821886
10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: comp.graphics
-161979.38306986375 547567.2698760114 -0.29581640828631267
10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: talk.politics.misc
-159579.70032298338 547567.2698760114 -0.29143396455949216
10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: sci.med
-183835.5334355675 547567.2698760114 -0.3357314133790253
10/11/19 10:45:12 INFO bayes.TestClassifier:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 0 ?%
Incorrectly Classified Instances : 0 ?%
Total Classified Instances : 0
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j
k l m n o p q r
s t <--Classified as
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 a = rec.sport.baseball
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 b = sci.crypt
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 c = rec.sport.hockey
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 d = talk.politics.guns
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 e = soc.religion.christian
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 f = sci.electronics
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 g = comp.os.ms-windows.misc
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 h = misc.forsale
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 i = talk.religion.misc
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 j = alt.atheism
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 k = comp.windows.x
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 l = talk.politics.mideast
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 m = comp.sys.ibm.pc.hardware
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 n = comp.sys.mac.hardware
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 o = sci.space
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 p = rec.motorcycles
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 q = rec.autos
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 r = comp.graphics
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 s = talk.politics.misc
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 | 0 t = sci.med
Default Category: unknown: 20
10/11/19 10:45:12 INFO driver.MahoutDriver: Program took 5485 ms
Am I missing anything .
Come to my second question, that means we are testing the classifier against
our inputs itself.
Still I didn't understand.
What I understood about classification is we have set of documents which
will act as model for classification of new documents in the system.
Am I right?
Doesn't Mahout works in same way ?
Third question, yeah I am looking for Mahout's API for classification.
@ Jaganadh - Thanks for clearing my doubts
Regards,
Divya
-----Original Message-----
From: JAGANADH G [mailto:[email protected]]
Sent: Friday, November 19, 2010 3:09 PM
To: [email protected]
Subject: Re: classification example doubts
>
> 1) I want to know what should go in "bayes-test-input".
>
>
After preparing the 20news-group data for training you can separate some
documents for testing your classifier.
These documents should go to "bayes-test-input".
Or ven you can put a new set of documets in the directory .
> 2) If we take Wikipedia example
> https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html
>
>
>
> To trainclassifier We have used Wikipediainput to generate model .
>
> To test classifier again we used wikipediamodel as input and Wikipedia
> input
> as test documents directory.
>
> I didn't understand why are we doing so ?
>
>
We are testing the classifier against the development set we used.
> 3) Last thing I want to know that when we use run testclassifier
using
> command line we can see the output.
>
> How can we make use of this output?
>
Are you looking for Mahout API usgae for classification ?
--
**********************************
JAGANADH G
http://jaganadhg.freeflux.net/blog