Re: Matching infinite sets

Marc Perkel Mon, 22 Aug 2016 09:06:39 -0700


On 08/22/16 07:45, Dianne Skoll wrote:

On Mon, 22 Aug 2016 07:34:00 -0700
Marc Perkel <supp...@junkemailfilter.com> wrote:

So.  What percentage of emails using your algorithm are actually
decidable?

Almost 100% if you look at a wide variety of tokens from multiple
attributes.

I can't believe that, or I'm missing something.  Almost every spam I see
contains words that also appear in ham.  Things like "this" or "invoice"
or "regards" or "dear".

What am I missing?

Hi Dianne, what your missing are word combinations. Usually it's not asingle word but a combination of words that trigger a result.



     Example of how NOT matching works

Let’s take 2 subject lines and see how this works.

“Meet hot Russian Brides Online!”
“I read an article about Russian Brides in a magazine”

A traditional spam filter using Bayesian or hard coded rules about“Russian Brides” might determine that only 1 out of 500 emailsmentioning the phrase “Russian Brides” is a good email. Thus the secondline would have points assessed against it in the classification processusing these traditional methods.

Using the Evolution Filter the phrase “Russian Brides” is in both setsand therefore has no influence on the results. But the first subjectmatches these phrases in the Spam Only set.


“Meet hot”
“Meet hot Russian”
“Meet hot Russian Brides”
“hot Russian Brides Online!”
“Russian Brides Online!”
“Brides Online!”
“Online!”

The second subject matches these phrases on the ham only set that arenever used on the spam set.


“I read an article”
“read an article”
“read an article about”
“about Russian”
“an article about”
“in a magazine”
“Brides in a”

So even though the phrase “Russian Brides” has no influence each subjecthits either ham or spam many times where the same phrase was never usedin the subject line in the opposite set. And the number of hits issignificant enough just from these subjects to cause the fingerprints tobe learned, and that’s just looking at the Subject attribute. When thisis combined with testing all attributes the messages usually come outstrongly on one side or the other.

In rule based systems one would not normally build a white list rule toto allocate points based on seeing the phrase “read an article about”.That’s where the Evolution Filter is different. It didn’t need to havethat rule because since it is comparing to the infinite set of what isnot matched on the other side, it dynamically create billions of rulesautomatically.



     [edit
     
<http://wiki.junkemailfilter.com/index.php?title=The_Evolution_Spam_Filter&action=edit&section=6>]




--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: Matching infinite sets

Reply via email to