In the past I've reported how effective I've found the Bayesian analysis filter supplied with James.

I still find it incredibly effective (roughly 97% of all spam is rejected). I just thought I'd mention an increasingly common technique I've noticed over the past couple of months which appears to reduce its effectiveness. The spammers appear to be producing very short messages (no more than two lines) and they ConcatenateTheWordsTogetherLikeThis.

The filter sees this as one big token which it has never seen before and therefore its effectiveness is reduced. Add to this the spammers seemingly never ending arsenal of domains and the filter stands no chance.

The only solution I can think of is some code which tries to break long tokens apart. A simple technique would be to break tokens up at changes of case so ConcatenateTheWordsTogetherLikeThis would become Concatenate^The^Words^Together^Like^This. The ultimate technique would be to run each token against a dictionary but I think that would be too costly.

David -


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to