Am 20.01.2016 um 21:24 schrieb John Hardin:
On Wed, 20 Jan 2016, Dianne Skoll wrote:

On Wed, 20 Jan 2016 11:52:35 -0800
Marc Perkel <[email protected]> wrote:

Again - Bayes compares what matches. My filter compares what doesn't
match.

Your filter is exactly equivalent to Bayes if you do the following
things:

1) Use combinations of up to four words as tokens, instead of just
single tokens.

2) Throw out any tokens whose probability is not either 100% spam or
100% ham.

Idea (1) is probably good.  We use words and word-pairs.  I'm not sure
the
extra storage for more than pairs is justifiable.

Personally I'd rather see SA implement *that*

yes, the part below as *additional tokens* to what bayes does now

-------- Weitergeleitete Nachricht --------
Betreff: Re: My new method for blocking spam - REVEALED!
Datum: Wed, 20 Jan 2016 15:20:01 -0500
Von: Dianne Skoll <[email protected]>
Organisation: Roaring Penguin Software Inc.
An: [email protected]

On Wed, 20 Jan 2016 12:11:02 -0800
Marc Perkel <[email protected]> wrote:

> Again - it's not about matching as Bayes does. It's about not
> matching.

It's not about not matching.  It's about a preprocessing step that
discards tokens that don't have extreme probabilities.

I think your method works as well as it does because you're using up
to four-word phrases as tokens.  The rest of the method is nonsense, but
the four-word phrase tokens are the magic ingredient; they'd make Bayes work awesomely also.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to