Jim C. Nasby wrote: > On Tue, Feb 07, 2006 at 07:59:37PM -0500, Matt Kettler wrote: > >> Jim, >> >> Bayes is NOT used when calculating autolearning score, that would >> promote self feedbac. As I said before, the autolearner's concept of >> score is VERY different from the final message score. Score >> contributions from bayes, white/blacklists, and the AWL are all ignored >> by the autolearner. It also looks up the individual rule scores from set >> 0 or 1 instead of 2 or 3. This is a MASSIVE difference. >> >> >> However, the default autolearn threshold is 0.1. That's a POSITIVE >> threshold. To the autolearner that message scored 0 points. 0 is less >> than 0.1, so it learned as HAM. >> >> I'd suggest re-adjusting your threshold, as a default spamassasin config >> will only VERY rarely generate a negative score to the autolearner. The >> only rules that can do it are bondedsender, habeas COI/SOI and hashcash. >> Hashcash is so rare it may as well not exist at present. BondedSender >> and Habeas are only use by large legitamate mailers, so none of your >> person-to-person mail will ever get autolearned in your current setup >> unless you know someone who uses hashcash. >> > > Ahh, got it. Makes much more sense. :) > > So I guess either 0 or -0.1 makes the most sense? > 0 makes the most sense, unless you add on negative-scoring rules. With a default SA there's really no difference in autolearning threshold between -1.3 and -0.1, and very little difference between -0.001 and -100.0.
Ignoring hashcash due to it's rarity, and bayes, the AWL, and all whitelists can't count so they are omitted: There are 0 rules in SA that can get you a learning score at or below -8.001 There are only 3 rules in SA that can get you a learning score at or below -2.3 There are only 7 rules in SA that can get you a learning score at or below -0.1. There are only 12 rules in SA that can get you a learning score at or below -0.001. The differences between the 4 cases is more-or less moot. You won't learn much ham at all. Even if you consider hashcash, that's only another 5 rules, and only applies when senders realize what hashcash even is. I run my boxes with -0.01 as a threshold, but I've added on about 30 simple body-text rules looking for "industry terminology" for my company's business and assigning -0.02 scores to them. This way I autolearn any business-related mail without any real chance of a spammer abusing them to whitelist himself. Even if a spam every single one of my rules, it would only knock 0.6 points off the spam score. For reference, these are the only rules in a stock SA 3.1.0 that can give you a negative learning score: score HABEAS_ACCREDITED_COI 0 -8.0 0 -8.0 score RCVD_IN_BSP_TRUSTED 0 -4.3 0 -4.3 score HABEAS_ACCREDITED_SOI 0 -4.3 0 -4.3 score ALL_TRUSTED -1.360 -1.440 -1.665 -1.800 score RCVD_IN_IADB_VOUCHED 0 -1.825 0 -2.200 score HABEAS_CHECKED 0 -0.2 0 -0.2 score RCVD_IN_BSP_OTHER 0 -0.1 0 -0.1 score NO_RELAYS -0.001 score NO_RECEIVED -0.001 score DK_VERIFIED -0.001 score SPF_PASS -0.001 score SPF_HELO_PASS -0.001 score HASHCASH_20 -0.500 score HASHCASH_21 -0.700 score HASHCASH_22 -1.000 score HASHCASH_23 -2.000 score HASHCASH_24 -3.000 score HASHCASH_25 -4.000 score HASHCASH_HIGH -5.000