Olaf Greve wrote:
Hi,
Firstly: I'm new to this list and also pretty new to SA in general. I
did try to find the answers to my questions in the FAQ, but haven't
succeeded beyond all doubt at doing so. I do hope, however, that I'm
not flogging a dead horse with my below questions (which appear at the
end of the message)...:P
Secondly, I'd like to say that SA is a *great* tool, and that
"Internet-life" is much better with it, than it used to be without it! :P
The situation:
I run a FreeBSD 5.4-release AMD-64 based server, on which I have
installed SA (identified by pkg_info as:
"p5-Mail-SpamAssassin-3.2.4_2") through Amavisd-new (precise version,
according to pkg_info: "amavisd-new-2.5.2,1"), which is being invoked
after mail arrives on the RX side of Sendmail. The RX daemon is split
in two, and tunnels the mail locally through amavisd-new (using clamd
and SA), and all mail that passes the tests gets delivered, and the
rest goes directly to the quarantine.
The problem:
The above set-up was working fine (using SA 3.2.3) for several months,
and virtually no spam got through. However, all of a sudden since some
two weeks I'm getting about 100 spam mails per day again, and these
seem to include spam mails that I have previously seen being filtered
out... Still, by far most of the spam does get filtered out, but for
some reason (perhaps spammers finding ways around SA?) more and more
spam is getting through again.
My approach so far:
Figuring SA or the rules to be outdated (despite the twice-weekly call
to sa-update from cron), I first updated SA to 3.2.4. (and performed
an sa-update too), but to no real avail: the same amount of spam
seemed to be getting through. I then checked into additional channels,
and soon came across the SARE (based) ones. I decided to add the
saupdates.openprotect.com channel, but still the same amount of spam
seems to get through.
The way I perform my updates are as follows:
Cron call:
23 3 * * 2,5 /usr/local/bin/sa-update --allowplugins --gpgkeyfile
/root/sa_pgp_keys --channelfile /root/sa_channels &&
/usr/local/etc/rc.d/sa-spamd.sh restart > /dev/null
(yes, I realise spamd is not actually used by amavisd-new, but I
decided to have it running anyway)
My /root/sa_channels file contains the following:
saupdates.openprotect.com
updates.spamassassin.org
Now, my questions are:
1-Am I doing anything wrong, or am I grossly overlooking something?
It's hard to say.. can you post an X-Spam-Status from one of the missed
messages? It's not perfect, but there's a lot we can tell from glancing
at that.. things like BAYES_00 or ALL_TRUSTED are signs of specific
problems...
2-I've never tried to teach SA about which messages are spam and which
are ham. From what I gather from the website, I need to set-up a
mailbox with solely spam and feed that to sa-learn, and then do the
same for a mailbox containing solely ham. However, how can I best go
about this? Once spam is misidentified, it gets mixed in the live
mailboxes with ham, so I wouldn't want to classify all of it as
either ham or spam... Then, I did keep the spam messages from the last
few days. Can I perhaps (manually) forward those to a local mailbox,
and then run sa-learn on that mailbox, getting it successfully
identified as spam, or will that not work due to the new mail headers
added by the forward action from my mail client?
You can't forward a message and then feed it to sa-learn. When you
forward a message, the content might look similar when rendered in a
mail client, but it's *vastly* different when you look at the complete,
raw message.
3-Are there perhaps other good (preferrably automatic ways) to tell SA
about what is spam, and what isn't?
SA has an autolearner built in and enabled by default, but it's not
perfect.
4-Are there perhaps other very efficient rules channels that you can
recommend me to add (like using the full set of SARE rules, rather
than the openprotect subset of it)?
5-Just a theory, but is it perhaps possible that SA somehow
misidentified a spam message as being ham, and that all messages that
are similar to that particular spam message are now being
misidentified as ham, hence all getting through?
Possible.. although it would generally take a lot of mislearning..
Seeing a low scoring BAYES_XX rule in the X-Spam-Status would suggest
this problem..
Any and all feedback will be greatly appreciated, and I would like to
thank you all for taking the time to read this e-mail and address the
questions raised in it.