@robin and @ted I tested it in a different way. I created a program to convert input text to Mahout training format. The program will remove all the punctuation and junk charters from a text, removes any numbers like year date exists there. Then it converts the text to lowercase. After that the text will be prepared in to a mahout training format (label"\t" text"\n").
After training with CBayesClasssifier I tested it. The result is 1) with ng=1 -a=1.0 Correctly calssified instances = 52.5% Incorrect = 47.5% 2) with ng=2 -a=1.0 Correctly calssified instances = 74.5% Incorrect = 25.5% Now I have question . 1) The output of preparetwentynesgroup creates a text from where all the stop words are removed. Also the text will be just a simple collection of words . So when we apply generateNGramsWithoutLabel() will it it generate NGrams correctly (Means accuracy of ngram?) -- ********************************** JAGANADH G http://jaganadhg.freeflux.net/blog
