Am 21.01.2016 um 13:11 schrieb RW:
On Wed, 20 Jan 2016 22:21:49 -0800
Marc Perkel wrote:

OK - Just to show you this isn't Bayesian - see if you can do this.

Here is a list of 5505874 words and phrases used in the subject line
of HAM and never seen in the subject line of SPAM

http://www.junkemailfilter.com/data/subject-ham.txt

Here is a list of 3494938 words and phrases used in the subject line
of SPAM and never seen in the subject line of HAM

http://www.junkemailfilter.com/data/subject-spam.txt

Hope you understand it now. Not Bayesian!!!!


the only difference between


   "ambulatory care" -> only in ham
   "aall cards"      -> only in spam

and

    "ambulatory care"  occurs 16 times in ham and 0 times in spam
    "aall cards"       occurs  0 times in ham and 3 times in spam

is that you have discarded the count information

no entirely when "urrently, SA's bayes tokens are single words" from https://mail-archives.apache.org/mod_mbox/spamassassin-dev/201211.mbox/%3c509d55a8.30...@gmail.com%3E is still true

please review that response below and consider 2/4 word tokes *additionally* in the SA-tokenizer and it will beat out the "new magic" easily witha well trained bayes in all cases

-------- Weitergeleitete Nachricht --------
Betreff: Re: My new method for blocking spam - REVEALED!
Datum: Wed, 20 Jan 2016 15:20:01 -0500
Von: Dianne Skoll <d...@roaringpenguin.com>
Organisation: Roaring Penguin Software Inc.
An: users@spamassassin.apache.org

On Wed, 20 Jan 2016 12:11:02 -0800
Marc Perkel <supp...@junkemailfilter.com> wrote:

> Again - it's not about matching as Bayes does. It's about not
> matching.

It's not about not matching. It's about a preprocessing step that
discards tokens that don't have extreme probabilities.

I think your method works as well as it does because you're using up
to four-word phrases as tokens. The rest of the method is nonsense, but
the four-word phrase tokens are the magic ingredient; they'd make Bayes work awesomely also.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to