On 06/07/11 09:17, Lars Jørgensen wrote:
I think many people run with tag at 5.0 and discard at 10.0
I should have mentioned that we are running amavisd-new. I thought that was the
de facto way of integrating spamassassin into a mail gateway, but reading this
list reveals that most people probably doesn't do that. Makes me wonder if I am
doing the wrong thing?
Amavisd-new has further settings as to thresholds, and these are the ones I put
in as of today (after reading other peoples tips here, thank you everybody):
$sa_tag_level_deflt = -10; # add spam info headers if at, or above that level
$sa_tag2_level_deflt = 5.2; # add 'spam detected' headers at that level
$sa_kill_level_deflt = 6.2; # triggers spam evasive actions (e.g. blocks mail)
$sa_dsn_cutoff_level = 7.4; # spam level beyond which a DSN is not sent
Does above scores make sense?
Yes, makes perfect sense to other amavisd-new users. I currently tag at
5.0 (the default SA score) and quarantine at 6.0. I also set the DSN
cut-off level to be the same as quarantine as I don't want to send DSNs.
If you are finding spam is getting through untagged with the default SA
score of 5.0 then I would look to write some additional rules to target
those spam that are getting through rather than lowering the score below
the SA default of 5.0. This list can help you with that if you provide
examples.
Additionally, I have very carefully hand trained bayes with only
confirmed spam/ham and tweaked the scores to be more representative of
the faith I have in my bayes data. I find many cases where bayes alone
will identify spam and have scored bayes_99 accordingly.
The main "problem" I see with SA is that I reject all the easy spam
(>90%) at the smtp level so SA only really gets to see the more
difficult and less obvious stuff. If SA saw all spam then the detection
rates out of the box would be extremely high, but with only the more
difficult samples to chew on detection rates inevitably drop and are
artificially lowered. As a result it can appear that a lot of spam is
getting through when in reality the overall percentage is still really
small. That last 1% is just hard to catch without increasing the risk of
false positives.