[EMAIL PROTECTED] (Justin Mason) writes:

> Bear in mind, the TCR figure that's output to the user in
> "fp-fn-statistics" output is mostly useful to compare against
> published algorithms, since it's the de-facto std of effectiveness in
> the academic lit on spam-filtering.

Erm, but everyone uses different lambdas and different corpora, so I'm
not sure when this type of comparison is possible.
 
> But we shouldn't use it ourselves internally as an effectiveness metric,
> because I don't think it's trustworthy (see below).
> 
> To remind us what they represent in Ion's papers:
> 
>     lambda=1: filing into a "spam" folder
>     lambda=9: bouncing back to sender saying "your mail was spam"
>     lambda=100: silent disposal
> 
> We should really be a lambda of 1, given that; but since SpamAssassin is
> also used in other systems (e.g. with a system-wide quarantine,
> unavailable to the end user), and because it was getting crazily-good
> efficiency figures (like TCR > 100) at l=1, I picked a compromise l=5.

I think the example mapping of policy to lambda number is wrong.
Clearly, 1 FP is not the same amount of pain as 1 FN when filing into
probable spam into a "spam" folder.  lambda may be especially low if
only 75% of spam is being caught with a high FP rate and you have to
check your spam folder every day, but it's much higher when you get to
SA-level accuracy.

Maybe it shouldn't be considered at all when we're doing score optimizer
work.  Maybe a better metric is needed.
 
> IMO a better metric would be to pick a desired FP rate, and then use
> FN as a single-figure metric given that FP rate.   Or vice versa.
> Basically lock down a desired FP or FN rate and allow the perceptron
> to find its "best" rate for the other figure.

I agree with that.  The perceptron is not quite there, though.

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Reply via email to