Re: How to properly teach SA to recognise the spam that is still getting through, despite the rules updates

Matt Kettler Thu, 28 Feb 2008 17:40:33 -0800

Olaf Greve wrote:

Hi,
Firstly: I'm new to this list and also pretty new to SA in general. Idid try to find the answers to my questions in the FAQ, but haven'tsucceeded beyond all doubt at doing so. I do hope, however, that I'mnot flogging a dead horse with my below questions (which appear at theend of the message)...:PSecondly, I'd like to say that SA is a *great* tool, and that"Internet-life" is much better with it, than it used to be without it! :PThe situation:I run a FreeBSD 5.4-release AMD-64 based server, on which I haveinstalled SA (identified by pkg_info as:"p5-Mail-SpamAssassin-3.2.4_2") through Amavisd-new (precise version,according to pkg_info: "amavisd-new-2.5.2,1"), which is being invokedafter mail arrives on the RX side of Sendmail. The RX daemon is splitin two, and tunnels the mail locally through amavisd-new (using clamdand SA), and all mail that passes the tests gets delivered, and therest goes directly to the quarantine.The problem:The above set-up was working fine (using SA 3.2.3) for several months,and virtually no spam got through. However, all of a sudden since sometwo weeks I'm getting about 100 spam mails per day again, and theseseem to include spam mails that I have previously seen being filteredout... Still, by far most of the spam does get filtered out, but forsome reason (perhaps spammers finding ways around SA?) more and morespam is getting through again.My approach so far:Figuring SA or the rules to be outdated (despite the twice-weekly callto sa-update from cron), I first updated SA to 3.2.4. (and performedan sa-update too), but to no real avail: the same amount of spamseemed to be getting through. I then checked into additional channels,and soon came across the SARE (based) ones. I decided to add thesaupdates.openprotect.com channel, but still the same amount of spamseems to get through.The way I perform my updates are as follows:Cron call:23 3 * * 2,5 /usr/local/bin/sa-update --allowplugins --gpgkeyfile/root/sa_pgp_keys --channelfile /root/sa_channels &&/usr/local/etc/rc.d/sa-spamd.sh restart > /dev/null(yes, I realise spamd is not actually used by amavisd-new, but Idecided to have it running anyway)My /root/sa_channels file contains the following:
saupdates.openprotect.com
updates.spamassassin.org
Now, my questions are:
1-Am I doing anything wrong, or am I grossly overlooking something?

It's hard to say.. can you post an X-Spam-Status from one of the missedmessages? It's not perfect, but there's a lot we can tell from glancingat that.. things like BAYES_00 or ALL_TRUSTED are signs of specificproblems...

2-I've never tried to teach SA about which messages are spam and whichare ham. From what I gather from the website, I need to set-up amailbox with solely spam and feed that to sa-learn, and then do thesame for a mailbox containing solely ham. However, how can I best goabout this? Once spam is misidentified, it gets mixed in the livemailboxes with ham, so I wouldn't want to classify all of it aseither ham or spam... Then, I did keep the spam messages from the lastfew days. Can I perhaps (manually) forward those to a local mailbox,and then run sa-learn on that mailbox, getting it successfullyidentified as spam, or will that not work due to the new mail headersadded by the forward action from my mail client?

You can't forward a message and then feed it to sa-learn. When youforward a message, the content might look similar when rendered in amail client, but it's *vastly* different when you look at the complete,raw message.

3-Are there perhaps other good (preferrably automatic ways) to tell SAabout what is spam, and what isn't?

SA has an autolearner built in and enabled by default, but it's notperfect.

4-Are there perhaps other very efficient rules channels that you canrecommend me to add (like using the full set of SARE rules, ratherthan the openprotect subset of it)?5-Just a theory, but is it perhaps possible that SA somehowmisidentified a spam message as being ham, and that all messages thatare similar to that particular spam message are now beingmisidentified as ham, hence all getting through?

Possible.. although it would generally take a lot of mislearning..Seeing a low scoring BAYES_XX rule in the X-Spam-Status would suggestthis problem..

Any and all feedback will be greatly appreciated, and I would like tothank you all for taking the time to read this e-mail and address thequestions raised in it.

Re: How to properly teach SA to recognise the spam that is still getting through, despite the rules updates

Reply via email to