Re: Querry regarding use of classifier in Mahout

Ted Dunning Wed, 20 Oct 2010 15:51:47 -0700

If this is testing on held-out data, then this is a pretty respectable
result for an untuned system.


Are these results on held-out data?

On Wed, Oct 20, 2010 at 6:35 AM, JAGANADH G <[email protected]> wrote:

> @robin and @ted
>
> I tested it in a different way.
> I created a program to convert input text to Mahout training format. The
> program will remove all the punctuation and junk charters from a text,
> removes any numbers like year date exists there. Then it converts the text
> to lowercase. After that the text will be prepared in to a mahout training
> format (label"\t" text"\n").
>
> After training with CBayesClasssifier I tested it.
> The result is
> 1) with ng=1 -a=1.0
> Correctly calssified instances = 52.5%
> Incorrect = 47.5%
> 2) with ng=2 -a=1.0
> Correctly calssified instances = 74.5%
> Incorrect = 25.5%
>
> Now I have question .
> 1) The output of preparetwentynesgroup creates a text from where all the
> stop words are removed. Also the text will be just a simple collection of
> words . So when we apply generateNGramsWithoutLabel() will it it generate
> NGrams correctly (Means accuracy of ngram?)
> --
> **********************************
> JAGANADH G
> http://jaganadhg.freeflux.net/blog
>

Re: Querry regarding use of classifier in Mahout

Reply via email to