Re: A New Approach: Find the Ham

Justin Mason Sun, 11 Feb 2007 03:32:19 -0800

Long-time SpamAssassin users with a good memory might recall back in
SpamAssassin 2.4x, we included quite a few ham-targeting rules, such as
"was this sent using User-Agent: Mozilla?", "is this formatted like a
reply to a previous message?", "does it include headers from a mailing
list?" and "is it formatted like a PGP-signed message?"


Pretty soon, spammers simply adopted _all_ of those attributes,
sending spam containing "User-Agent: mozilla", In-Reply-To headers,
formatted like PGP-signed reply messages ;)

If you give spammers a way to get negative points easily, they'll attack
it.  it's simply unsafe to assume they won't.  A published ruleset that
does this based on forgeable attributes will be quickly attacked (again).

Having said that, rules that are *unforgeable* are entirely safe to use,
and we include those -- namely whitelist_from_rcvd/spf/dk/dkim, and the
locally-trained Bayes tests (which spammers have a much harder time
guessing).

Also, writing your own local ham-spotting rules is generally safe, as long
as you don't publish them where spammers can find out about them.

--j.

Nigel Frankcom writes:
>On Sat, 10 Feb 2007 15:14:56 -0500, Miles Fidelman
><[EMAIL PROTECTED]> wrote:
>
>>Dan wrote:
>>> I've developed a new approach to scoring that I want to 1) share with=20
>>> everyone and 2) make into a working system thats as accurate as what=20
>>> I've already built, but easier to use.  First, the theory:
>>>
>>> NEW ASSUMPTION
>>> All messages are spam unless x,y,z score says they're ham.
>>>
>>> NEW APPROACH
>>> Block everything, then create rules to not catch what you do want. =20
>>> ie, build tests that target the spam (keeping all the tests you've=20
>>> already built), then score the thousands of ways ham triggers on those=
>=20
>>> tests.
>>It strikes me that the hardest part of this approach is filtering out=20
>>too much ham.  At least for me, it's more important to make sure that=20
>>people reach me, than to filter out all spam.  If we take the approach=20
>>that everything is to be filtered out, except x,y,z - then the risk of=20
>>filtering out too much seems pretty high.
>
>These are my local stats... I'd far rather those numbers were the
>other way round.
>
>Even if Dan is wrong, at least he's thinking.
>
>http://www.blue-canoe.com/stats/index.php?D1=3D11
>
>What do Theo, Matt & Co have to say? They've been doing this a lot
>longer than us.
>
>Kind regards
>
>

Re: A New Approach: Find the Ham

Reply via email to