Well, my two cents on this:
When I upgraded my servers (about half a year ago) and started using a
mysql-based Bayes DB, image spams began to drive me crazy. Seemed like there
was no way to stop them. But with a good purge of bayes, a rebuild, and the
addition of sa-update rules, it all began to get better. Right now, I have
implemented a system for my users to train a global Bayes database, and I
must say it is working almost flawlessly. Only a few discussion lists got
BAYES_99 hits, but as soon as the users forwarded them to the ham training
account (or moved them to their webmail-based HAM folders), everything got
better. I'm a small fish in this fight (two servers, about 400 users each,
~25000 messages a day, ~20000 rejected via zenspamhaus.org mostly, ~1100
spam messages, and ~30 virus messages a day), but I must say that taking
good care of my Bayes database has improved a lot the spam fighting
capabilities of my servers. It includes making sa-forget of false positives,
then feeding them to sa-learn as ham, sa-forget of false negatives and
making SA analyze and report them, etc. Luckily, I managed to write some
scripts to do the work for me. They're still at test stage, but I'm
convinced that they seem to perform very well...

A taste: http://www.biol.unlp.edu.ar/cgi-bin/mailgraph.cgi


Luis

2007/3/23, Jim Maul <[EMAIL PROTECTED]>:

Marc Perkel wrote:
> Perhaps what I need to do is to get rid of autolearn and write my own
> learning system that strips out the body of messages with images and
> just learns the headers. My problem is that when users get image spam
> they put it in the spam folders and they get learned. But the text in
> the image spam causes ham type text to be learned as spam. That causes
> ham to get higher scores.
>
>

Are you sure of this?  Have you also trained these ham messages to
counter this effect?  Not too long ago we were in the same situation.  I
have autolearn enabled but I have adjusted the thresholds to avoid
learning false positives/negatives.  We were getting ham (although
arguably - they were newsletter type ham) that was hitting BAYES_99.  As
soon as i started training them as ham the problem went away.  Spam is
still detected correctly by bayes and these newsletters no longer hit
bayes_99.

-Jim




--
-------------------------------------------------
GNU-GPL: "May The Source Be With You...
-------------------------------------------------

Reply via email to