RE: Scoring Hoaxes

Bram Mertens 23 May 2004 08:38:30 -0000

On Thu, 2004-05-20 at 20:27, E. Falk wrote:
> I don't have any anti-hoax rules to share, but I would caution against 
> scoring these as spam, especially if you're using any sort of Bayesian 
> auto-learning or auto-whitelisting.
> 
> Hoaxes are almost always passed along by people who are actually known 
> to the recipient and whom the recipient would like to receive mail from. 
> You don't want the system to recognize future e-mail from the sender as 
> spam in this case.
[...]


This thread has provided some very interesting reading material, but the
above has me confused.  Just when I thought I was beginning to
understand bayes...

If we add rules that mark hoaxes as spam bayes will look for tokens in
those messages to score future messages which is a "good thing", right?

But Evan warns that bayes will then also mark ham from the people that
sent these hoaxes as spam which is off course a "bad thing".

BUT AFAIK bayes doesn't care about the sender, it only looks at the
message body or am I wrong here?

AWL looks at the sender, so AWL might score ham from these sender higher
than it should BUT as AWL calculates scores as an average this shouldn't
be much of a problem unless you receive more hoaxes and/or spam than ham
from this person.  In this case I don't think it's a "bad thing" anymore
for SA to score messages from this person as spam, but that's just me. 
And again you could always whitelist_from.

Even if bayes does look at the sender, all harm adding this ruleset
would do would be that some tokens that used to be good indicators of
spam will become less effective for a small subset of messages.  Doesn't
seem like a big problem to me.

About FP's, I usually cut the hoax from my replies when I try to
convince my friends (again) that sending hoaxes, virus warnings or
jokes[1] is "A REALLY BAD THING", so all we'd have to be careful about
is using too few tokens to identify a message as a hoax.  But as the
SARE ninjas seem to have taken up this battle I'm quite confident the
rules that will make it into this ruleset will be near-perfect. (thanks
ninjas!)

So could someone explain the dangers of adding a hoax-ruleset to me?  It
seems I'm missing something.

TIA

Bram
-- 
# Mertens Bram "M8ram"   <[EMAIL PROTECTED]>   Linux User #349737 #
# SuSE Linux 8.2 (i586)     kernel 2.4.20-4GB      i686     256MB RAM #
# 10:18am  up 62 days 13:57,  7 users,  load average: 0.00, 0.01, 0.00 #

RE: Scoring Hoaxes

Reply via email to