Hi,

I'm setting up a new anti-spam gateway for a fairly busy site (about 20k
messages a day) using Postfix/Amavis/SpamAssassin/ClamAV on a Debian
etch system that delivers incoming (ham) mail to an Exchange 2003
server.

Since the old gateway was using a similar setup, there are already SPAM
and HAM public mail folders which our users contribute to. The SPAM
folder usually gets a lot of (untagged) spam, about 500 every day, while
the HAM gets very little, and most of it is internal (within Exchange)
mail that never passes through the gateway.

I'm wondering whether it's worthwhile to use that kind of data to feed
sa-learn, since a) a lot more spam than spam gets reported and b) most
of the ham reported is mail that just moves within different Exchange
mailboxes and never passes through the gateway.

If indeed it's mostly useless (or maybe even harmful for the Bayes
filter) then I was wondering if it would be more logical to have only
the technical team feed the SPAM and HAM folders with proper messages
(ie good mail that comes from an external source in the case as HAM).

In that case, I'm wondering if the fact that only specific users report
SPAM and HAM could trigger the Bayes filter to think that a message
would be more hammy or spammy depending on the recipient.

In short, I'm looking for a way to feed sa-learn that's at least
minimally effective in a situation where only a little useful HAM is
being reported by our users at large.


-- 
Jérôme Charaoui <[EMAIL PROTECTED]>
Service informatique - Collège de Maisonneuve

Reply via email to