In the past I've reported how effective I've found the Bayesian analysis
filter supplied with James.
I still find it incredibly effective (roughly 97% of all spam is
rejected). I just thought I'd mention an increasingly common technique
I've noticed over the past couple of months which appears to reduce its
effectiveness. The spammers appear to be producing very short messages
(no more than two lines) and they ConcatenateTheWordsTogetherLikeThis.
The filter sees this as one big token which it has never seen before and
therefore its effectiveness is reduced. Add to this the spammers
seemingly never ending arsenal of domains and the filter stands no chance.
The only solution I can think of is some code which tries to break long
tokens apart. A simple technique would be to break tokens up at changes
of case so ConcatenateTheWordsTogetherLikeThis would become
Concatenate^The^Words^Together^Like^This. The ultimate technique would
be to run each token against a dictionary but I think that would be too
costly.
David -
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]