Re: Beginner's question on rules

darxus Sun, 24 Jul 2011 09:25:36 -0700

On 07/24, Stephan wrote:
> I have been setting up a home mail server recently and it seems that I
> cannot get all spam trapped correctly. Example is below for instance:


"All spam"?  You may have unrealistic expectations.  Although I certainly
encourage you to try to do better than what anybody else has managed.
Seriously, that's the only way we get better at this.

For example, in the ideal case where the email you get exactly matches
the email that spamassassin was trained on, in the STATISTICS-set3.txt.gz
(network and bayes tests enabled) file included with spamassassin it says:

# False positives:         8  0.04%
# False negatives:       691  1.57%

1.57% of spam missed.  

> http://pastebin.com/EBER8iuP

> So my question is, what should I do basically to increase the accuracy of
> this detection ? Should I change my thresholds ? Manually create a
> blacklist ? Add some custum rulesets (I recently added Khopesh's one)

It might be useful to tell us exactly what scores you're getting for each
test you're hitting, by using "spamassassin -t".

Do not lower your threshold below 5.  All scores are generated assuming a
threshold of 5 with a target of 1 in 2,500 false positives.  Lowering your
threshold will increase your false positives.  

Sought is the only other rule set I'd recommend:
http://wiki.apache.org/spamassassin/SoughtRules

Do you have Pyzor and Razor installed?

You could increase the score of BAYES_99 if you trust it.  You should check
the scores on all your non-spam that hits BAYES_99 and see how much of them
would become flagged as spam if you increase that score.  I wouldn't
recommend that without disabling auto-training bayes ("bayes_auto_learn 0") 
because that can go wrong (auto-training spam as non-spam and reverse).
And keep in mind, if you only have, say, 100 non-spams to base your score
change on, you risk increasing your false positives from ~1 in 2,500 to ~1
in 101 or worse.

If this is a repeated problem, it might be useful to try coming up with
your own custom rule or two.  And if they help, please share with this
mailing list.  http://wiki.apache.org/spamassassin/WritingRules

Another possibility is to participate in the nightly mass checks -
submitting your rule hit stats (not emails) to the process which calculates
spamassassin scores:  http://wiki.apache.org/spamassassin/NightlyMassCheck
We always need more of that to increase everybody's accuracy, and of course
it'll increase your accuracy more than those who don't participate.

I've started a combination IP white + blacklist, which you're welcome to
contribute to:  http://www.chaosreigns.com/iprep/
I'm kind of excited about it, but it needs more contributors to really be
useful for non-contributors.

-- 
"Let's just say that if complete and utter chaos was lightning, then
he'd be the sort to stand on a hilltop in a thunderstorm wearing wet
copper armour and shouting 'All gods are bastards'." - The Color of Magic
http://www.ChaosReigns.com

Re: Beginner's question on rules

Reply via email to