Catch rates (was Re: My new method for blocking spam - REVEALED!)

2016-01-20 Thread Dianne Skoll
On Wed, 20 Jan 2016 15:37:33 -0800 jdow wrote: > This observation invites a heretical question. Is nearly perfect spam > classification dangerous compared to merely 99.9%/0.1% accurate > classification? I think it's meaningless to talk about classifications better than

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Shawn Bakhtiar
Sorry.. how is this different than Naive Bayes filtering?? "Naive Bayes classifiers work by correlating the use of tokens (typically words, or sometimes other things), with spam and non-spam e-mails and then using Bayes' theorem to calculate a probability that an email is or is not spam." —

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Reindl Harald
Am 20.01.2016 um 17:52 schrieb Marc Perkel: So - how do I get a list of words and phrases never used in spam? I create a list of words and phrases that are used in spam and check to see if it's *not on the list*. What I do is tokenize the spamiest parts of the email, like the subject line,

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Marc Perkel
Yes - you missed something. It is about intersecting one corpi and NOT intersecting the other. This is about what doesn't match - not what does. On 01/20/16 10:26, Shawn Bakhtiar wrote: Sorry.. how is this different than Naive Bayes filtering?? "Naive Bayes classifiers work by correlating

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread John Hardin
On Wed, 20 Jan 2016, Marc Perkel wrote: Maybe I should call it a new plan for spam? Perhaps FUSSP? (Sorry... You're so rah rah about this I couldn't resist... :) ) So - how do I get a list of words and phrases never used in spam? I create a list of words and phrases that are used in spam

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Antony Stone
On Wednesday 20 January 2016 at 17:52:05, Marc Perkel wrote: > Suppose I get an email with the subject line "Let's get some lunch". I > know it's a good email because spammers never say "Let's go to lunch". > In fact there are an infinite number of words and phrases that are used > in good email

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Antony Stone
On Wednesday 20 January 2016 at 19:50:23, Reindl Harald wrote: > DELIVERED 32943 91.46 % > > BLOCKED 3679 10.21 % Why don't those add up to 100%? Or am I misunderstanding the labelling? Antony. -- Python is executable pseudocode. Perl is executable line noise.

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Wrolf
Good luck with your patent application, it should be in the infinitely elastic queue right after my perpetual motion machine. Not sure how you will deal with the number of ham tokens in spam messages. Also not sure how much ham will get canned as spam - but then, maybe people shouldn't be sending

My new method for blocking spam - REVEALED!

2016-01-20 Thread Marc Perkel
OK - following up on this. I have my provisional patent filed. I'm still doing development to improve it and working on a licensing contract. But the license will be based on the Creative Commons patent with some restrictions added. Basically I want to get a license fee from the big guys and

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Marc Perkel
On 01/20/16 10:36, John Hardin wrote: On Wed, 20 Jan 2016, Marc Perkel wrote: . So it still needs to be trained, at least initially, with a manually-vetted corpus. If not, how do you propose to do the initial classification of messages for training? Do you envision it being self-training

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Reindl Harald
Am 20.01.2016 um 19:55 schrieb Antony Stone: On Wednesday 20 January 2016 at 19:50:23, Reindl Harald wrote: DELIVERED 32943 91.46 % BLOCKED 3679 10.21 % Why don't those add up to 100%? Or am I misunderstanding the labelling? grep/count of the maillog from the current

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Marc Perkel
On 01/20/16 11:25, John Hardin wrote: On Wed, 20 Jan 2016, Marc Perkel wrote: On 01/20/16 10:44, Antony Stone wrote: How do you identify "the spammiest parts" of an email? The Subject line - the first few words of the email. the header structure, behavior. File extensions of attached

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread John Hardin
On Wed, 20 Jan 2016, Antony Stone wrote: On Wednesday 20 January 2016 at 17:52:05, Marc Perkel wrote: Suppose I get an email with the subject line "Let's get some lunch". I know it's a good email because spammers never say "Let's go to lunch". In fact there are an infinite number of words and

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Marc Perkel
On 01/20/16 10:44, Antony Stone wrote: On Wednesday 20 January 2016 at 17:52:05, Marc Perkel wrote: Suppose I get an email with the subject line "Let's get some lunch". I know it's a good email because spammers never say "Let's go to lunch". In fact there are an infinite number of words and

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Reindl Harald
Am 20.01.2016 um 20:03 schrieb Reindl Harald: Am 20.01.2016 um 19:55 schrieb Antony Stone: On Wednesday 20 January 2016 at 19:50:23, Reindl Harald wrote: DELIVERED 32943 91.46 % BLOCKED 3679 10.21 % Why don't those add up to 100%? Or am I misunderstanding the

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread RW
> On 01/20/16 10:26, Shawn Bakhtiar wrote: > > Sorry.. how is this different than Naive Bayes filtering?? On Wed, 20 Jan 2016 10:52:58 -0800 Marc Perkel wrote: > Yes - you missed something. It is about intersecting one corpi and > NOT intersecting the other. > > This is about what doesn't

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Dianne Skoll
On Wed, 20 Jan 2016 08:52:05 -0800 Marc Perkel wrote: > Suppose I get an email with the subject line "Let's get some lunch". > I know it's a good email because spammers never say "Let's go to > lunch". Really? You know that for a fact? > In fact there are an

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread John Hardin
On Wed, 20 Jan 2016, Marc Perkel wrote: On 01/20/16 10:44, Antony Stone wrote: How do you identify "the spammiest parts" of an email? The Subject line - the first few words of the email. the header structure, behavior. File extensions of attached files. Are you getting .zip/.rar/etc

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Marc Perkel
On 01/20/16 11:32, Reindl Harald wrote: Am 20.01.2016 um 20:27 schrieb Marc Perkel: On 01/20/16 11:25, John Hardin wrote: On Wed, 20 Jan 2016, Marc Perkel wrote: On 01/20/16 10:44, Antony Stone wrote: How do you identify "the spammiest parts" of an email? The Subject line - the first

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Reindl Harald
Am 20.01.2016 um 20:05 schrieb Dianne Skoll: On Wed, 20 Jan 2016 08:52:05 -0800 Marc Perkel wrote: Suppose I get an email with the subject line "Let's get some lunch". I know it's a good email because spammers never say "Let's go to lunch". Really? You know

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread John Hardin
On Wed, 20 Jan 2016, Marc Perkel wrote: On 01/20/16 10:36, John Hardin wrote: On Wed, 20 Jan 2016, Marc Perkel wrote: . So it still needs to be trained, at least initially, with a manually-vetted corpus. If not, how do you propose to do the initial classification of messages for training?

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Reindl Harald
Am 20.01.2016 um 20:27 schrieb Marc Perkel: On 01/20/16 11:25, John Hardin wrote: On Wed, 20 Jan 2016, Marc Perkel wrote: On 01/20/16 10:44, Antony Stone wrote: How do you identify "the spammiest parts" of an email? The Subject line - the first few words of the email. the header

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Dianne Skoll
On Wed, 20 Jan 2016 11:35:33 -0800 Marc Perkel wrote: > Bayes is about matching. My Evolution filter is about NOT matching. > It's the*NOT matching* that makes it different. Unless you've described it wrong, it's not about not matching. Its about seeing if there

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Noel Butler
On 21/01/2016 06:19, Marc Perkel wrote: The way I know what spammers never use is I store what spammers do use and see if it doesn't match. I've processed more that 100 million spams and it's amazing how many common words and phrases that spammers never use. until now they didnt use it, I

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Quanah Gibson-Mount
--On Wednesday, January 20, 2016 4:26 PM -0500 Wrolf wrote: ​Is Marc's approach "novel" and "non-obvious"? (Patents must be novel, non-obvious, and useful.) I think plenty of people have supplied prior art, and that the concept itself is obvious since other things

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Marc Perkel
On 01/20/16 12:05, RW wrote: On 01/20/16 10:26, Shawn Bakhtiar wrote: Sorry.. how is this different than Naive Bayes filtering?? On Wed, 20 Jan 2016 10:52:58 -0800 Marc Perkel wrote: Yes - you missed something. It is about intersecting one corpi and NOT intersecting the other. This is

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Marc Perkel
On 01/20/16 12:14, Reindl Harald wrote: Am 20.01.2016 um 21:11 schrieb Marc Perkel: On 01/20/16 12:05, RW wrote: On 01/20/16 10:26, Shawn Bakhtiar wrote: Sorry.. how is this different than Naive Bayes filtering?? On Wed, 20 Jan 2016 10:52:58 -0800 Marc Perkel wrote: Yes - you missed

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Dianne Skoll
On Wed, 20 Jan 2016 12:11:02 -0800 Marc Perkel wrote: > Again - it's not about matching as Bayes does. It's about not > matching. It's not about not matching. It's about a preprocessing step that discards tokens that don't have extreme probabilities. I think your

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Dianne Skoll
On Wed, 20 Jan 2016 12:19:10 -0800 Marc Perkel wrote: > The way I know what spammers never use is I store what spammers do > use and see if it doesn't match. I've processed more that 100 million > spams and it's amazing how many common words and phrases that >

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Wrolf
​Is Marc's approach "novel" and "non-obvious"? (Patents must be novel, non-obvious, and useful.) Would SpamAssassin be infringing, if Marc cashed in and sold his patent to some less open minded investor? (Patent trolls are a real thing.) Wrolf

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread RW
On Wed, 20 Jan 2016 12:11:02 -0800 Marc Perkel wrote: > Again - it's not about matching as Bayes does. It's about not > matching. > > In the subject line of the message the phrase "method for blocking > spam" makes the message ham. Spammers never use the phrase "method > for blocking spam". No

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Reindl Harald
Am 20.01.2016 um 21:11 schrieb Marc Perkel: On 01/20/16 12:05, RW wrote: On 01/20/16 10:26, Shawn Bakhtiar wrote: Sorry.. how is this different than Naive Bayes filtering?? On Wed, 20 Jan 2016 10:52:58 -0800 Marc Perkel wrote: Yes - you missed something. It is about intersecting one

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Joe Quinn
On 1/20/2016 3:20 PM, Dianne Skoll wrote: On Wed, 20 Jan 2016 12:11:02 -0800 Marc Perkel wrote: Again - it's not about matching as Bayes does. It's about not matching. It's not about not matching. It's about a preprocessing step that discards tokens that don't

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Axb
On 01/20/2016 10:28 PM, Quanah Gibson-Mount wrote: --On Wednesday, January 20, 2016 4:26 PM -0500 Wrolf wrote: ​Is Marc's approach "novel" and "non-obvious"? (Patents must be novel, non-obvious, and useful.) I think plenty of people have supplied prior art, and that the

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Dianne Skoll
On Wed, 20 Jan 2016 14:48:19 -0800 Marc Perkel wrote: > To be a little clearer. This new system isn't perfect. And it's main > strength is identifying good email. It does catch a lot more spam for > sure but when people scream at me it's because I blocked something

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread jdow
I wonder how this differs from some of the classifiers within CRM114. Several of them seem to work on phrases (with high costs) or single words. {^_^} On 2016-01-20 11:05, Dianne Skoll wrote: On Wed, 20 Jan 2016 08:52:05 -0800 Marc Perkel wrote: Suppose I get

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Marc Perkel
It could be challenging if someone impersonated a bank and they did it right. I'm looking at more aspects than just the content of the message but that's an area where there is some possible weakness. There are other tricks to address the specifically. And I am looking at behavior and headers

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread jdow
And just how well does this work against spearfishing? And would the same magic list work for ma and pa Kettle well into their 80s only receiving emails from their children and Freddie Burfle with his heads buried in a corporate accounts payable office? {^_^} On 2016-01-20 08:52, Marc Perkel

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Wrolf
Asserting SA as prior art would require some pretty hefty legal fees. From what I understand the US Patent Office pretty much grants all patents, and lets the courts work it out. Open source projects do not have deep pockets. Maybe intervention is needed before a patent is granted. Wrolf

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread jdow
This observation invites a heretical question. Is nearly perfect spam classification dangerous compared to merely 99.9%/0.1% accurate classification? If people get used to no spam do they become more vulnerable to really well crafted spam? {o.o} On 2016-01-20 14:48, Marc Perkel wrote: It