On Sat, Jul 03, 2004 at 06:35:39PM -0400, Scot L. Harris wrote: > But I thought that was the point of the bayes system. You keep teaching > a sampling of the current spam and ham you get and it expires older > entries as they exceed a certain time period. Which in itself tells me > that the spammer could then go back to their "old" tricks and get around
For me at least, the number of different tokens in mail is much lower than the number of spam signs spammers can put into their mail ! I don't have numbers for this, but I have found it very hard to train a spam when the system had also trained positively on numerous other undetected spams with the same characteristics (especially but not exclusively DSNs). And even with an autolearn_spam_threshold of 0.1, still much spam from new sources with new tokens will be learnt as ham ! A busy system might learn a strong wrong signal before the admins catch up with rules. This is based on the difference beween administering my home system (a few hundred mails/day) and work (tens of thousands). Nick
