Adam,

I'm sure everyone else that replies may say basically the same as me,
but here's my input about SA and your management's questions.

> Hash Busting - slightly modify each copy of message to foil
> 'fingerprinting' techniques

AFAIK, the fingerprinting techinques are "fuzzy" and can withstand a
a little bit of abuse.

> Bayes Poisoning - addition of random dictionary words

My only experience with bayes poisoning has been from this list :)  By
that, I mean I have had a mail on this list go talking about spam, and
the db got almost reversed.  I'll talk more about this later.

> Hidden Text - using invisible text in html messages

SA has specificc rules for this.

> Keyword Corruption - using obfuscated text to hide keywords

SA has specific rules for this.

> Tiny Messages - messages with only URL or image

SA has specific rules for this.


What I like about SA is that there is no specific rule or subset of
rules that can trigger a mail to be labled as spam.

Even when my bayes db got poisoned by this list.  I was experimenting
with tagging low spam scores with '*** LIKELY SPAM ***' in the subject
because my anal retentive users would complain very loudly if anything
was marked as a false positive.  The thing that irritated me about these
complaints is that all of the mails that were labled this way scored
just barely as spam, and all of these mails were _solicited_ bulk email,
and looked like spam to me, and used many of the same tools and tricks
that real spammers use.

What I do now, is I set my threshold score high (10), and I have custom
spam and ham rules as well as a 3rd party plugin to raise scores.  My
average spam score is 20 or above.  I don't have real data, but the
number of missed mails is very low.  Like less than 10 since SA 3.0 came
out.  And I have had 0 false positives for my mailbox, and the 1st false
positive for one user today from a mail that was very borderline, and it
would not have been missed if it was not delivered.

My only issue with SA is that it does not appear to scale very well.  I
have not experienced this problem personally, because the domain that I
run SA on does not have very high mail traffic, but this does appear to
be an issue, and there are workarounds for it by skipping some tests at
the expense of doing better filtering.

OK, another issue, but a small one (I'm pretty picky), is that the
scores for some of the rules do not always seem correct.  High scores
for things that seem pretty benign, and low scores for things that look
almost exclusively like spam (such as forged headers, or mismatched IPs).
I know these scores are somehow objectively done with a corpus of ham
and spam and some algorithm to score accordingly, but to me some of them
just seem wrong.  Maybe scores and rules could be autolearned like
bayes.  Not sure.

Thats my input for your managers.

Mike

-- 
/-----------------------------------------\
| Michael Barnes <[EMAIL PROTECTED]> |
| UNIX Systems Administrator              |
| College of William and Mary             |
| Phone: (757) 879-3930                   |
\-----------------------------------------/

Reply via email to