Hello Bram,

I'll try to answer you at least insofar as why my organization wouldn't allow a hoax ruleset installed on the MTA...

-----Original Message-----
From: Bram Mertens
Sent: Sunday, May 23, 2004 3:38 AM
To: spamassassin
Subject: RE: Scoring Hoaxes

> If we add rules that mark hoaxes as spam bayes will look for tokens in
> those messages to score future messages which is a "good thing", right?

Yes, that would be a "good thing" for blocking future hoaxes.

> But Evan warns that bayes will then also mark ham from the people that
> sent these hoaxes as spam which is off course a "bad thing".

Right...

> BUT AFAIK bayes doesn't care about the sender, it only looks at the
> message body or am I wrong here?

That could be the case - I need to do some more reading on SA's Bayes routines to come up with an answer on that one, but it doesn't make much difference, because...

>--
># Mertens Bram "M8ram"   <[EMAIL PROTECTED]>   Linux User #349737 #
># SuSE Linux 8.2 (i586)     kernel 2.4.20-4GB      i686     256MB RAM #
># 10:18am  up 62 days 13:57,  7 users,  load average: 0.00, 0.01, 0.00 #

Signatures - most corporate employees have sigs (fairly distinct ones, often). That's a Bayes problem right there, because your hoaxes are going to get pretty high scores due to the limited variety of them (they are much easier to identify than spam, in general).

> AWL looks at the sender, so AWL might score ham from these sender higher
> than it should BUT as AWL calculates scores as an average this shouldn't
> be much of a problem unless you receive more hoaxes and/or spam than ham
> from this person.  In this case I don't think it's a "bad thing" anymore
> for SA to score messages from this person as spam, but that's just me.
> And again you could always whitelist_from.

As above, hoaxes will probably score fairly high, and so AWL will compound that problem. On a personal e-mail system, this is an acceptable level of risk, since volume is probably low and FP's can be examined. On a corporate e-mail system, you're looking at risk vs. benefit scenarios. When a lost e-mail could cost a lot of money, you've got to be a little careful.

> About FP's, I usually cut the hoax from my replies when I try to
> convince my friends (again) that sending hoaxes, virus warnings or
> jokes[1] is "A REALLY BAD THING", so all we'd have to be careful about
> is using too few tokens to identify a message as a hoax.  But as the
> SARE ninjas seem to have taken up this battle I'm quite confident the
> rules that will make it into this ruleset will be near-perfect. (thanks
> ninjas!)

Hopefully, yes! A lot of the messages on this list concerning hoaxes have set off rulesets of people already running anti-hoax filters, although their auto-responders have identified it as a hoax, typically, which means they are handling it separately from their spam - an ideal solution (and if someone is administering such a system, I'd love to know how they've set it up)!

> So could someone explain the dangers of adding a hoax-ruleset to me?  It
> seems I'm missing something.

Seems you've mostly got a handle on it - my objections were partly just "on principle". It comes down to the "what is spam?" argument... is spam unsolicited commercial e-mail? Unsolicited bulk e-mail? Mail I don't want?

According to our company policy, we define spam as unsolicted commercial e-mail. That gets filtered to the best of our abilities. Anything else represents such a small subset of "unwanted" e-mail with so many pitfalls like those above (bayes, awl, etc) that filtering it is not practical.

Hope this helps - this is by no means authoritative, just one company's policy. :)

Evan II



Reply via email to