On Wed, 20 Jan 2016 11:46:39 -0800
Marc Perkel wrote:

> Let me give you an example. Here's 2 subject lines. Easy to guess
> which one is spam.
> 
> "Meet horny Russian Brides online!"
> "I read an article about Russian brides in a magazine."
> 
> Bayes or spam assassin would look at "Russian Brides" and 499 out of
> 500 times it's spam. Therefore the nonspam version scores spam points.

Not if you modify the the Robinson parameters and the cut-off to exclude
such tokens. Then only the tokens your system would use would make it
through to the final cut. 

> My filter gets both correctly because of NOT matching. Not matching
> is a comparison to an infinite set.

It's not an infinite set unless you assume that phrases never
seen before are spammy or hammy.


Reply via email to