Am 20.01.2016 um 21:24 schrieb John Hardin:
On Wed, 20 Jan 2016, Dianne Skoll wrote:On Wed, 20 Jan 2016 11:52:35 -0800 Marc Perkel <[email protected]> wrote:Again - Bayes compares what matches. My filter compares what doesn't match.Your filter is exactly equivalent to Bayes if you do the following things: 1) Use combinations of up to four words as tokens, instead of just single tokens. 2) Throw out any tokens whose probability is not either 100% spam or 100% ham. Idea (1) is probably good. We use words and word-pairs. I'm not sure the extra storage for more than pairs is justifiable.Personally I'd rather see SA implement *that*
yes, the part below as *additional tokens* to what bayes does now -------- Weitergeleitete Nachricht -------- Betreff: Re: My new method for blocking spam - REVEALED! Datum: Wed, 20 Jan 2016 15:20:01 -0500 Von: Dianne Skoll <[email protected]> Organisation: Roaring Penguin Software Inc. An: [email protected] On Wed, 20 Jan 2016 12:11:02 -0800 Marc Perkel <[email protected]> wrote: > Again - it's not about matching as Bayes does. It's about not > matching. It's not about not matching. It's about a preprocessing step that discards tokens that don't have extreme probabilities. I think your method works as well as it does because you're using up to four-word phrases as tokens. The rest of the method is nonsense, butthe four-word phrase tokens are the magic ingredient; they'd make Bayes work awesomely also.
signature.asc
Description: OpenPGP digital signature
