Re: Spamassassin Learn

Matt Kettler Tue, 07 Feb 2006 19:06:29 -0800

Jim C. Nasby wrote:
> On Tue, Feb 07, 2006 at 07:59:37PM -0500, Matt Kettler wrote:
>   
>> Jim,
>>
>> Bayes is NOT used when calculating autolearning score, that would
>> promote self feedbac. As I said before, the autolearner's concept of
>> score is VERY different from the final message score. Score
>> contributions from bayes, white/blacklists, and the AWL are all ignored
>> by the autolearner. It also looks up the individual rule scores from set
>> 0 or 1 instead of 2 or 3. This is a MASSIVE difference.
>>
>>
>> However, the default autolearn threshold is 0.1. That's a POSITIVE
>> threshold. To the autolearner that message scored 0 points. 0 is less
>> than 0.1, so it learned as HAM.
>>
>> I'd suggest re-adjusting your threshold, as a default spamassasin config
>> will only VERY rarely generate a negative score to the autolearner. The
>> only rules that can do it are bondedsender, habeas COI/SOI and hashcash.
>> Hashcash is so rare it may as well not exist at present. BondedSender
>> and Habeas are only use by large legitamate mailers, so none of your
>> person-to-person mail will ever get autolearned in your current setup
>> unless you know someone who uses hashcash.
>>     
>
> Ahh, got it. Makes much more sense. :)
>
> So I guess either 0 or -0.1 makes the most sense?
>   
0 makes the most sense, unless you add on negative-scoring rules.  With
a default SA there's really no difference in autolearning threshold
between -1.3 and -0.1, and very little difference between -0.001 and -100.0.


Ignoring hashcash due to it's rarity, and bayes, the AWL, and all
whitelists can't count so they are omitted:

There are 0 rules in SA that can get you a learning score at or below -8.001
There are only 3 rules in SA that can get you a learning score at or
below -2.3
There are only 7 rules in SA that can get you a learning score at or
below -0.1.
There are only 12 rules in SA that can get you a learning score at or
below -0.001.

The differences between the 4 cases is more-or less moot. You won't
learn much ham at all.

Even if you consider hashcash, that's only another 5 rules, and only
applies when senders realize what hashcash even is.

 I run my boxes with -0.01 as a threshold, but I've added on about 30
simple body-text rules looking for "industry terminology" for my
company's business and assigning -0.02 scores to them. This way I
autolearn any business-related mail without any real chance of a spammer
abusing them to whitelist himself. Even if a spam every single one of my
rules, it would only knock 0.6 points off the spam score.

For reference, these are the only rules in a stock  SA 3.1.0 that can
give you a negative learning score:

score HABEAS_ACCREDITED_COI 0 -8.0 0 -8.0
score RCVD_IN_BSP_TRUSTED 0 -4.3 0 -4.3
score HABEAS_ACCREDITED_SOI 0 -4.3 0 -4.3
score ALL_TRUSTED -1.360 -1.440 -1.665 -1.800
score RCVD_IN_IADB_VOUCHED 0 -1.825 0 -2.200
score HABEAS_CHECKED 0 -0.2 0 -0.2
score RCVD_IN_BSP_OTHER 0 -0.1 0 -0.1
score NO_RELAYS -0.001
score NO_RECEIVED -0.001
score DK_VERIFIED -0.001
score SPF_PASS -0.001
score SPF_HELO_PASS -0.001

score HASHCASH_20 -0.500
score HASHCASH_21 -0.700
score HASHCASH_22 -1.000
score HASHCASH_23 -2.000
score HASHCASH_24 -3.000
score HASHCASH_25 -4.000
score HASHCASH_HIGH -5.000

Re: Spamassassin Learn

Reply via email to