Bayesian Analysis spam filter is under attack

David Legg Sun, 25 Nov 2007 08:21:53 -0800

In the past I've reported how effective I've found the Bayesian analysisfilter supplied with James.

I still find it incredibly effective (roughly 97% of all spam isrejected). I just thought I'd mention an increasingly common techniqueI've noticed over the past couple of months which appears to reduce itseffectiveness. The spammers appear to be producing very short messages(no more than two lines) and they ConcatenateTheWordsTogetherLikeThis.

The filter sees this as one big token which it has never seen before andtherefore its effectiveness is reduced. Add to this the spammersseemingly never ending arsenal of domains and the filter stands no chance.

The only solution I can think of is some code which tries to break longtokens apart. A simple technique would be to break tokens up at changesof case so ConcatenateTheWordsTogetherLikeThis would becomeConcatenate^The^Words^Together^Like^This. The ultimate technique wouldbe to run each token against a dictionary but I think that would be toocostly.


David -


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Bayesian Analysis spam filter is under attack

Reply via email to