@robin and @ted

I tested it in a different way.
I created a program to convert input text to Mahout training format. The
program will remove all the punctuation and junk charters from a text,
removes any numbers like year date exists there. Then it converts the text
to lowercase. After that the text will be prepared in to a mahout training
format (label"\t" text"\n").

After training with CBayesClasssifier I tested it.
The result is
1) with ng=1 -a=1.0
Correctly calssified instances = 52.5%
Incorrect = 47.5%
2) with ng=2 -a=1.0
Correctly calssified instances = 74.5%
Incorrect = 25.5%

Now I have question .
1) The output of preparetwentynesgroup creates a text from where all the
stop words are removed. Also the text will be just a simple collection of
words . So when we apply generateNGramsWithoutLabel() will it it generate
NGrams correctly (Means accuracy of ngram?)
-- 
**********************************
JAGANADH G
http://jaganadhg.freeflux.net/blog

Reply via email to