Yassen Damyanov wrote: > > Hi SA User List, > > Here's my case: postfix + amavisd-new + SpamAssassin 2.64 > working on a Gentoo Linux box, serving as a mail server for > serveral virtual domains. > > Some SpamAssassin details: Bayes learning activated recently, > based on about 300 spam mails and 200 ham mails, which accumulate > in IMAP folders and are scanned using sa-learn via cron job three > times a day. > > It all seems to work, but I see SA passing through some obvious spam, > so I decided to look. And what I dicovered was very surprising for me: > SA computes the score well, but suddenly lowers it significantly > exactly before returning an answer to amavisd. > > Here an examples: > . > . > debug: running raw-body-text per-line regexp tests; score so far=4.166 > debug: running uri tests; score so far=4.166 > debug: uri tests: Done uriRE > debug: running full-text regexp tests; score so far=4.166 > debug: all '*From' addrs: [EMAIL PROTECTED] > debug: all '*To' addrs: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] > [EMAIL PROTECTED] [EMAIL PROTECTED] ydamian > [EMAIL PROTECTED] > debug: forged-HELO: from=media-c.local helo=troyer.co.at by=media-c.de > debug: forged-HELO: mismatch on HELO: 'troyer.co.at' != 'media-c.local' > debug: forged-HELO: from=wanadoo.fr helo= by=troyer.co.at > debug: forged-HELO: mismatch on from: 'media-c.local' != 'troyer.co.at' > debug: running meta tests; score so far=5.53 > debug: auto-learn? ham=0.2, spam=8, body-hits=4.166, head-hits=1.364 ^^^^^^^ This isn't exactly obvious, but this may be part of your problem. I've had trouble in the past with Bayes learning very low-scoring spam as ham - so I lowered the autolearn-as-ham threshold to -0.1.
> debug: is spam? score=0.629 required=6.8 > tests=BAYES_00,DATE_IN_PAST_12_24,SARE_ADULT2,SARE_OBFUPORNO ^^^^^^^^ As already mentioned by others, this is your ovbious up-front problem. Bayes is considering the message to be ham, so there's a pretty big score reduction. Depending on how long your Bayes db has been live, you may be able to just learn a spam collection correctly and fix the problem, or you may have to delete it and start again. > I suspected the bayesian learning to be blamed... but when checking > the learning sesssions logs, everyhting is correct, spam and ham are > perfectly sorted and learning is conducted as appropriate. So I am > stuck. Manual learning may not be at fault, but *something* is feeding spam in as ham. How to you feed mail into sa-learn? Do you just periodically sa-learn a set of inboxes and spam folders? Do you have hand-sorted mail folders that get periodically learned? Do you just manually learn reported mistagged mail (of either variety)? -kgd -- Get your mouse off of there! You don't know where that email has been!