On Wed, 20 Jan 2016 11:52:35 -0800 Marc Perkel <[email protected]> wrote:
> Again - Bayes compares what matches. My filter compares what doesn't > match. Your filter is exactly equivalent to Bayes if you do the following things: 1) Use combinations of up to four words as tokens, instead of just single tokens. 2) Throw out any tokens whose probability is not either 100% spam or 100% ham. Idea (1) is probably good. We use words and word-pairs. I'm not sure the extra storage for more than pairs is justifiable. Idea (2) is probably bad. You are throwing out potentially useful information. Regards, Dianne.
