Hi Ted,

Thanks! I will experiment with the percentile rank idea and post the results
here if I find anything interesting.

~sumedh

On Thu, Dec 16, 2010 at 4:46 PM, Ted Dunning <[email protected]> wrote:

> Once you learn the model, the scores should be roughly comparable for
> documents of the same length.  If you have all short docs like your
> examples
> here, you can probably use percentile rank for the score for a particular
> category and document length as  your measure of quality.  The conditioning
> on document length may also not be necessary, but you should experiment
> with
> that.  The rationale for that last is that long documents really are less
> ambiguous so normalizing that away may be unnecessary.
>
> On Thu, Dec 16, 2010 at 12:36 PM, Sumedh Mungee <[email protected]> wrote:
>
> > Hi,
> >
> > I read that the score reported by the cbayes classifier is not a
> > probability
> > and is only useful for relative ranking, but is there a way to compare or
> > normalize scores across classifications?
> >
> > Basically I'm looking for a way to weed out the low-probability matches..
> >
> > For instance, if I get the following classifications:
> > "apple, red" --> Fruit, Score == 10.39
> > "apple, white" --> Laptop, Score == 12.33
> > "red" --> Fruit, Score == 3.444
> >
> > I want to be able to weed out the last "red" --> Fruit classification,
> > because the score is "too low".
> >
> > Hope my question makes sense.
> >
> > (First post here. Wonderful work by the Mahout team!)
> >
> > Thanks!
> >
> > ~sumedh
> > (Mahout 0.4; 4.5 million documents; 200+ labels)
> >
>

Reply via email to