On Wed, 20 Jan 2016 11:46:39 -0800 Marc Perkel wrote: > Let me give you an example. Here's 2 subject lines. Easy to guess > which one is spam. > > "Meet horny Russian Brides online!" > "I read an article about Russian brides in a magazine." > > Bayes or spam assassin would look at "Russian Brides" and 499 out of > 500 times it's spam. Therefore the nonspam version scores spam points.
Not if you modify the the Robinson parameters and the cut-off to exclude such tokens. Then only the tokens your system would use would make it through to the final cut. > My filter gets both correctly because of NOT matching. Not matching > is a comparison to an infinite set. It's not an infinite set unless you assume that phrases never seen before are spammy or hammy.
