Yassen Damyanov wrote:
> 
> Hi SA User List,
> 
> Here's my case: postfix + amavisd-new + SpamAssassin 2.64
> working  on a Gentoo Linux box, serving as a mail server for
> serveral virtual domains.
> 
> Some SpamAssassin details: Bayes learning activated recently,
> based on about 300 spam mails and 200 ham mails, which accumulate
> in IMAP folders and are scanned using sa-learn via cron job three
> times a day.
> 
> It all seems to work, but I see SA passing through some obvious spam,
> so I decided to look. And what I dicovered was very surprising for me:
>    SA computes the score well, but suddenly lowers it significantly
> exactly before returning an answer to amavisd.
> 
> Here an examples:
> .
> .
> debug: running raw-body-text per-line regexp tests; score so far=4.166
> debug: running uri tests; score so far=4.166
> debug: uri tests: Done uriRE
> debug: running full-text regexp tests; score so far=4.166
> debug: all '*From' addrs: [EMAIL PROTECTED]
> debug: all '*To' addrs: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] 
> [EMAIL PROTECTED] [EMAIL PROTECTED] ydamian
> [EMAIL PROTECTED]
> debug: forged-HELO: from=media-c.local helo=troyer.co.at by=media-c.de
> debug: forged-HELO: mismatch on HELO: 'troyer.co.at' != 'media-c.local'
> debug: forged-HELO: from=wanadoo.fr helo= by=troyer.co.at
> debug: forged-HELO: mismatch on from: 'media-c.local' != 'troyer.co.at'
> debug: running meta tests; score so far=5.53
> debug: auto-learn? ham=0.2, spam=8, body-hits=4.166, head-hits=1.364
                     ^^^^^^^
This isn't exactly obvious, but this may be part of your problem.  I've
had trouble in the past with Bayes learning very low-scoring spam as ham
- so I lowered the autolearn-as-ham threshold to -0.1.

> debug: is spam? score=0.629 required=6.8
> tests=BAYES_00,DATE_IN_PAST_12_24,SARE_ADULT2,SARE_OBFUPORNO
        ^^^^^^^^
As already mentioned by others, this is your ovbious up-front problem.
Bayes is considering the message to be ham, so there's a pretty big
score reduction.  Depending on how long your Bayes db has been live, you
may be able to just learn a spam collection correctly and fix the
problem, or you may have to delete it and start again.

> I suspected the bayesian learning to be blamed... but when checking
> the learning sesssions logs, everyhting is correct, spam and ham are
> perfectly sorted and learning is conducted as appropriate. So I am
> stuck.

Manual learning may not be at fault, but *something* is feeding spam in
as ham.  How to you feed mail into sa-learn?  Do you just periodically
sa-learn a set of inboxes and spam folders?  Do you have hand-sorted
mail folders that get periodically learned?  Do you just manually learn
reported mistagged mail (of either variety)?

-kgd
-- 
Get your mouse off of there!  You don't know where that email has been!

Reply via email to