Long-time SpamAssassin users with a good memory might recall back in SpamAssassin 2.4x, we included quite a few ham-targeting rules, such as "was this sent using User-Agent: Mozilla?", "is this formatted like a reply to a previous message?", "does it include headers from a mailing list?" and "is it formatted like a PGP-signed message?"
Pretty soon, spammers simply adopted _all_ of those attributes, sending spam containing "User-Agent: mozilla", In-Reply-To headers, formatted like PGP-signed reply messages ;) If you give spammers a way to get negative points easily, they'll attack it. it's simply unsafe to assume they won't. A published ruleset that does this based on forgeable attributes will be quickly attacked (again). Having said that, rules that are *unforgeable* are entirely safe to use, and we include those -- namely whitelist_from_rcvd/spf/dk/dkim, and the locally-trained Bayes tests (which spammers have a much harder time guessing). Also, writing your own local ham-spotting rules is generally safe, as long as you don't publish them where spammers can find out about them. --j. Nigel Frankcom writes: >On Sat, 10 Feb 2007 15:14:56 -0500, Miles Fidelman ><[EMAIL PROTECTED]> wrote: > >>Dan wrote: >>> I've developed a new approach to scoring that I want to 1) share with=20 >>> everyone and 2) make into a working system thats as accurate as what=20 >>> I've already built, but easier to use. First, the theory: >>> >>> NEW ASSUMPTION >>> All messages are spam unless x,y,z score says they're ham. >>> >>> NEW APPROACH >>> Block everything, then create rules to not catch what you do want. =20 >>> ie, build tests that target the spam (keeping all the tests you've=20 >>> already built), then score the thousands of ways ham triggers on those= >=20 >>> tests. >>It strikes me that the hardest part of this approach is filtering out=20 >>too much ham. At least for me, it's more important to make sure that=20 >>people reach me, than to filter out all spam. If we take the approach=20 >>that everything is to be filtered out, except x,y,z - then the risk of=20 >>filtering out too much seems pretty high. > >These are my local stats... I'd far rather those numbers were the >other way round. > >Even if Dan is wrong, at least he's thinking. > >http://www.blue-canoe.com/stats/index.php?D1=3D11 > >What do Theo, Matt & Co have to say? They've been doing this a lot >longer than us. > >Kind regards > >