On Wed, 20 Jan 2016 11:52:35 -0800
Marc Perkel <[email protected]> wrote:

> Again - Bayes compares what matches. My filter compares what doesn't
> match.

Your filter is exactly equivalent to Bayes if you do the following
things:

1) Use combinations of up to four words as tokens, instead of just
single tokens.

2) Throw out any tokens whose probability is not either 100% spam or 100% ham.

Idea (1) is probably good.  We use words and word-pairs.  I'm not sure the
extra storage for more than pairs is justifiable.

Idea (2) is probably bad.  You are throwing out potentially useful
information.

Regards,

Dianne.

Reply via email to