https://bz.apache.org/SpamAssassin/show_bug.cgi?id=5376
Henrik Krohns changed:
What|Removed |Added
Status|NEW |RESOLVED
CC|
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-08-22 04:42 ---
I think we should go ahead with using FP% and FN% -- a double-figure metric --
instead of any single figure metric. I don't think any of the single-figure
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-08-14 05:32 ---
lam() avoids the problem noted in comment 13, but has another problem; it
doesn't have any concept of an FP being worse than an FN. This means that e.g.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-08-14 06:03 ---
here's a version of lam() with a lambda calculation...
#!/usr/bin/perl
# http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376#c16
my ($lambda, $fppc,
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-08-11 01:30 ---
Re: divide by 0
That's a computational issue, not a theoretical one. If there are no FPs or FNs,
then it makes sense for TCR (or F) to be infinite.
You just
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-08-11 04:15 ---
(In reply to comment #19)
Re: divide by 0
That's a computational issue, not a theoretical one. If there are no FPs or
FNs,
then it makes sense for TCR
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-08-07 23:47 ---
if FP and FN are 0 -- ie there were no misclassifications [...]
-- it yields a division by zero
That's an idea! Implement it and spammers will start
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-08-07 10:46 ---
(In reply to comment #10)
I propose the following new
measurement. Let's call it the Findlay measurement:
F(lambda) = 1 / (FN% + FP% * lambda)
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-08-07 11:07 ---
That's exactly right, one FP% is worth 50 FN%, if you set lambda = 50. If you
don't like that tradeoff, pick a different value for lambda. :-)
I agree it
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-08-07 11:25 ---
I should point out also that where I wrote FN% and FP% I really meant the
proportion of FN/FPs and not percentage, so you need to divide the percentage by
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-08-07 12:51 ---
Btw, the CEAS 2007 contest used the following metrics:
Filters will be evaluated using the lam() metric. Lam() calculates the
average of a filter's
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-08-07 13:51 ---
lam sounds promising -- let's look into that.
btw there's another issue with the F(lambda) idea -- if FP and FN
are 0 -- ie there were no
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-08-01 04:59 ---
here's another machine-learning Challenge --
http://challenge.spock.com/pages/learn_more
$50k prize on this one. I doubt we could match that ;)
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-07-06 04:58 ---
(In reply to comment #8)
Re: Bayes rules.
No, they should not be immutable. If you want, we can require them to be
sane
for some definition of sane.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-07-06 13:43 ---
Re: Bayes immutability
All I'm really trying to say is that during scoring runs, we should be changing
the BAYES_ scores. We can manually make them sane if
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-07-04 00:25 ---
Unfortunately it's been a while since I've looked at this stuff. (Actually, it's
been like 3 months... which is hardly a while, but it's been a busy 3
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-07-04 20:18 ---
Re: Bayes rules.
No, they should not be immutable. If you want, we can require them to be sane
for some definition of sane. There's no compelling reason for
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-07-03 07:40 ---
ok, I wrote up a spec of what we currently require:
http://wiki.apache.org/spamassassin/SpamAssassinChallenge
does this look like a reasonable idea?
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-07-03 08:27 ---
I think our current generate scores once a millenia method is not really
functional for several reasons, but mainly that it takes too long between
updates
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-07-03 08:23 ---
Should there be something in there about code that can be used without any sort
of patent issues and available under the Apache License?
--- You are
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
--- Additional Comments From [EMAIL PROTECTED] 2007-07-03 08:43 ---
(In reply to comment #3)
I think our current generate scores once a millenia method is not really
functional for several reasons, but mainly that it takes
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
[EMAIL PROTECTED] changed:
What|Removed |Added
OtherBugsDependingO||4560
nThis|
22 matches
Mail list logo