If this is testing on held-out data, then this is a pretty respectable result for an untuned system.
Are these results on held-out data? On Wed, Oct 20, 2010 at 6:35 AM, JAGANADH G <[email protected]> wrote: > @robin and @ted > > I tested it in a different way. > I created a program to convert input text to Mahout training format. The > program will remove all the punctuation and junk charters from a text, > removes any numbers like year date exists there. Then it converts the text > to lowercase. After that the text will be prepared in to a mahout training > format (label"\t" text"\n"). > > After training with CBayesClasssifier I tested it. > The result is > 1) with ng=1 -a=1.0 > Correctly calssified instances = 52.5% > Incorrect = 47.5% > 2) with ng=2 -a=1.0 > Correctly calssified instances = 74.5% > Incorrect = 25.5% > > Now I have question . > 1) The output of preparetwentynesgroup creates a text from where all the > stop words are removed. Also the text will be just a simple collection of > words . So when we apply generateNGramsWithoutLabel() will it it generate > NGrams correctly (Means accuracy of ngram?) > -- > ********************************** > JAGANADH G > http://jaganadhg.freeflux.net/blog >
