Hello Al,

AJ> Could you share your spam filters with the group? It seems I am
AJ> constantly adding words and I know there must be another, simpler way.

I'm doing this a bit differently. Instead of trying to invent my own
spam filters (reinvent the wheel, again), I'm using some free tools
that are available. My mail is scanned for "spamability" on my linux
box using the free Spam Assassin (www.spamassassin.org) and MIME
Defang tools (www.roaringpenguin.com). By the time TB downloads my
messages they've already been assessed for SPAM probability. But those
of you without an email server you can add antispam software to,
you're in luck. There's also free Win32 version that you can very
easily use with ANY email client called SAProxy
(http://saproxy.bloomba.com/). It will scan messages and flag them as
SPAM (based on your configuration) as they are downloaded from your
server to your client. I use this to filter my work email, since my
employer doesn't have any spam filtering system in place. To use it
simply change two things in your email client configuration and voila,
now all of your incoming mail is scanned for Spaminess, and marked up
accordingly (rewriting subject, adding a hidden header line, or
whatever you configure it to do). Then you can add a filter to TheBat
to act on this flag (if header X-SPAM: Yes exists, then automatically
delete message, for example). Works very, very well.

Spam Assassin uses more than looking for keywords like "sex" and
"viagra", etc. It will look for things like SHOUTING TOO MUCH, as seen
on tv, subjects with numbers at the end, too many recipients, unlisted
recipients, click here to remove, etc. etc. Heck, this message might
get flagged just because it contains to many phrases. It also looks
for things like single-image HTML emails (the latest trick by spammers
to get around keyword filters), HTML emails with too much red text,
etc. Someone else is already trying to keep ahead of spammers, I don't
have the time to update it myself. Plus, it can optionally plug into
the Razor and other RBL databases which lets you fingerprint messages
and compare incoming messages to a database of known spam messages. It
also lets you reject emails from known spam-only hosts.

This is all stuff TB filters can't handle, can't handle very well, or
would be a complete bear to set up and maintain. So I'll let my email
server or SAProxy on my desktop determine if it thinks the message is
SPAM or not, and then I'll filter in TB based on what SpamAssassin
determines.

But the coolest tool on the horizon is contained in latest version of
Spam Assassin: Bayesian filtering. This sounds to be the most
promising *learning* method of distinguishing SPAM from "HAM" (good
messages). From what I read it has a very low (1%) false-positive
rate. That's better that practically all other tests in use at the
moment. But to make Bayesian filtering work well, you need to send it
SPAM and HAM messages so it can learn the difference. So, what I'm
ultimately looking for is a way to save messages designated as SPAM in
MBOX format with original headers. Same with my HAM messages. Then I
need to send these two mboxes to my linux box and tell Spam Assassin
to "train" itself using the latest email messages I'm feeding it. It's
this training process I would like to automate.

Anyhow, I thought you might find all that interesting... :)

--
James



________________________________________________
Current version is 1.62 | "Using TBUDL" information:
http://www.silverstones.com/thebat/TBUDLInfo.html

Reply via email to