On Wed, 20 Jan 2016 15:37:33 -0800
jdow wrote:
> This observation invites a heretical question. Is nearly perfect spam
> classification dangerous compared to merely 99.9%/0.1% accurate
> classification?
I think it's meaningless to talk about classifications better than
Sorry.. how is this different than Naive Bayes filtering??
"Naive Bayes classifiers work by correlating the use of tokens (typically
words, or sometimes other things), with spam and non-spam e-mails and then
using Bayes' theorem to calculate a probability that an email is or is not
spam."
—
Am 20.01.2016 um 17:52 schrieb Marc Perkel:
So - how do I get a list of words and phrases never used in spam? I
create a list of words and phrases that are used in spam and check to
see if it's *not on the list*.
What I do is tokenize the spamiest parts of the email, like the subject
line,
Yes - you missed something. It is about intersecting one corpi and NOT
intersecting the other.
This is about what doesn't match - not what does.
On 01/20/16 10:26, Shawn Bakhtiar wrote:
Sorry.. how is this different than Naive Bayes filtering??
"Naive Bayes classifiers work by correlating
On Wed, 20 Jan 2016, Marc Perkel wrote:
Maybe I should call it a new plan for spam?
Perhaps FUSSP? (Sorry... You're so rah rah about this I couldn't
resist... :) )
So - how do I get a list of words and phrases never used in spam? I
create a list of words and phrases that are used in spam
On Wednesday 20 January 2016 at 17:52:05, Marc Perkel wrote:
> Suppose I get an email with the subject line "Let's get some lunch". I
> know it's a good email because spammers never say "Let's go to lunch".
> In fact there are an infinite number of words and phrases that are used
> in good email
On Wednesday 20 January 2016 at 19:50:23, Reindl Harald wrote:
> DELIVERED 32943 91.46 %
>
> BLOCKED 3679 10.21 %
Why don't those add up to 100%?
Or am I misunderstanding the labelling?
Antony.
--
Python is executable pseudocode.
Perl is executable line noise.
Good luck with your patent application, it should be in the infinitely
elastic queue right after my perpetual motion machine.
Not sure how you will deal with the number of ham tokens in spam messages.
Also not sure how much ham will get canned as spam - but then, maybe people
shouldn't be sending
OK - following up on this. I have my provisional patent filed. I'm still
doing development to improve it and working on a licensing contract. But
the license will be based on the Creative Commons patent with some
restrictions added. Basically I want to get a license fee from the big
guys and
On 01/20/16 10:36, John Hardin wrote:
On Wed, 20 Jan 2016, Marc Perkel wrote: .
So it still needs to be trained, at least initially, with a
manually-vetted corpus. If not, how do you propose to do the initial
classification of messages for training?
Do you envision it being self-training
Am 20.01.2016 um 19:55 schrieb Antony Stone:
On Wednesday 20 January 2016 at 19:50:23, Reindl Harald wrote:
DELIVERED 32943 91.46 %
BLOCKED 3679 10.21 %
Why don't those add up to 100%?
Or am I misunderstanding the labelling?
grep/count of the maillog from the current
On 01/20/16 11:25, John Hardin wrote:
On Wed, 20 Jan 2016, Marc Perkel wrote:
On 01/20/16 10:44, Antony Stone wrote:
How do you identify "the spammiest parts" of an email?
The Subject line - the first few words of the email. the header
structure, behavior. File extensions of attached
On Wed, 20 Jan 2016, Antony Stone wrote:
On Wednesday 20 January 2016 at 17:52:05, Marc Perkel wrote:
Suppose I get an email with the subject line "Let's get some lunch". I
know it's a good email because spammers never say "Let's go to lunch".
In fact there are an infinite number of words and
On 01/20/16 10:44, Antony Stone wrote:
On Wednesday 20 January 2016 at 17:52:05, Marc Perkel wrote:
Suppose I get an email with the subject line "Let's get some lunch". I
know it's a good email because spammers never say "Let's go to lunch".
In fact there are an infinite number of words and
Am 20.01.2016 um 20:03 schrieb Reindl Harald:
Am 20.01.2016 um 19:55 schrieb Antony Stone:
On Wednesday 20 January 2016 at 19:50:23, Reindl Harald wrote:
DELIVERED 32943 91.46 %
BLOCKED 3679 10.21 %
Why don't those add up to 100%?
Or am I misunderstanding the
> On 01/20/16 10:26, Shawn Bakhtiar wrote:
> > Sorry.. how is this different than Naive Bayes filtering??
On Wed, 20 Jan 2016 10:52:58 -0800
Marc Perkel wrote:
> Yes - you missed something. It is about intersecting one corpi and
> NOT intersecting the other.
>
> This is about what doesn't
On Wed, 20 Jan 2016 08:52:05 -0800
Marc Perkel wrote:
> Suppose I get an email with the subject line "Let's get some lunch".
> I know it's a good email because spammers never say "Let's go to
> lunch".
Really? You know that for a fact?
> In fact there are an
On Wed, 20 Jan 2016, Marc Perkel wrote:
On 01/20/16 10:44, Antony Stone wrote:
How do you identify "the spammiest parts" of an email?
The Subject line - the first few words of the email. the header structure,
behavior. File extensions of attached files.
Are you getting .zip/.rar/etc
On 01/20/16 11:32, Reindl Harald wrote:
Am 20.01.2016 um 20:27 schrieb Marc Perkel:
On 01/20/16 11:25, John Hardin wrote:
On Wed, 20 Jan 2016, Marc Perkel wrote:
On 01/20/16 10:44, Antony Stone wrote:
How do you identify "the spammiest parts" of an email?
The Subject line - the first
Am 20.01.2016 um 20:05 schrieb Dianne Skoll:
On Wed, 20 Jan 2016 08:52:05 -0800
Marc Perkel wrote:
Suppose I get an email with the subject line "Let's get some lunch".
I know it's a good email because spammers never say "Let's go to
lunch".
Really? You know
On Wed, 20 Jan 2016, Marc Perkel wrote:
On 01/20/16 10:36, John Hardin wrote:
On Wed, 20 Jan 2016, Marc Perkel wrote: .
So it still needs to be trained, at least initially, with a
manually-vetted corpus. If not, how do you propose to do the initial
classification of messages for training?
Am 20.01.2016 um 20:27 schrieb Marc Perkel:
On 01/20/16 11:25, John Hardin wrote:
On Wed, 20 Jan 2016, Marc Perkel wrote:
On 01/20/16 10:44, Antony Stone wrote:
How do you identify "the spammiest parts" of an email?
The Subject line - the first few words of the email. the header
On Wed, 20 Jan 2016 11:35:33 -0800
Marc Perkel wrote:
> Bayes is about matching. My Evolution filter is about NOT matching.
> It's the*NOT matching* that makes it different.
Unless you've described it wrong, it's not about not matching. Its
about seeing if there
On 21/01/2016 06:19, Marc Perkel wrote:
The way I know what spammers never use is I store what spammers do use
and see if it doesn't match. I've processed more that 100 million
spams and it's amazing how many common words and phrases that spammers
never use.
until now they didnt use it, I
--On Wednesday, January 20, 2016 4:26 PM -0500 Wrolf
wrote:
Is Marc's approach "novel" and "non-obvious"? (Patents must be novel,
non-obvious, and useful.)
I think plenty of people have supplied prior art, and that the concept
itself is obvious since other things
On 01/20/16 12:05, RW wrote:
On 01/20/16 10:26, Shawn Bakhtiar wrote:
Sorry.. how is this different than Naive Bayes filtering??
On Wed, 20 Jan 2016 10:52:58 -0800
Marc Perkel wrote:
Yes - you missed something. It is about intersecting one corpi and
NOT intersecting the other.
This is
On 01/20/16 12:14, Reindl Harald wrote:
Am 20.01.2016 um 21:11 schrieb Marc Perkel:
On 01/20/16 12:05, RW wrote:
On 01/20/16 10:26, Shawn Bakhtiar wrote:
Sorry.. how is this different than Naive Bayes filtering??
On Wed, 20 Jan 2016 10:52:58 -0800
Marc Perkel wrote:
Yes - you missed
On Wed, 20 Jan 2016 12:11:02 -0800
Marc Perkel wrote:
> Again - it's not about matching as Bayes does. It's about not
> matching.
It's not about not matching. It's about a preprocessing step that
discards tokens that don't have extreme probabilities.
I think your
On Wed, 20 Jan 2016 12:19:10 -0800
Marc Perkel wrote:
> The way I know what spammers never use is I store what spammers do
> use and see if it doesn't match. I've processed more that 100 million
> spams and it's amazing how many common words and phrases that
>
Is Marc's approach "novel" and "non-obvious"? (Patents must be novel,
non-obvious, and useful.)
Would SpamAssassin be infringing, if Marc cashed in and sold his patent to
some less open minded investor? (Patent trolls are a real thing.)
Wrolf
On Wed, 20 Jan 2016 12:11:02 -0800
Marc Perkel wrote:
> Again - it's not about matching as Bayes does. It's about not
> matching.
>
> In the subject line of the message the phrase "method for blocking
> spam" makes the message ham. Spammers never use the phrase "method
> for blocking spam". No
Am 20.01.2016 um 21:11 schrieb Marc Perkel:
On 01/20/16 12:05, RW wrote:
On 01/20/16 10:26, Shawn Bakhtiar wrote:
Sorry.. how is this different than Naive Bayes filtering??
On Wed, 20 Jan 2016 10:52:58 -0800
Marc Perkel wrote:
Yes - you missed something. It is about intersecting one
On 1/20/2016 3:20 PM, Dianne Skoll wrote:
On Wed, 20 Jan 2016 12:11:02 -0800
Marc Perkel wrote:
Again - it's not about matching as Bayes does. It's about not
matching.
It's not about not matching. It's about a preprocessing step that
discards tokens that don't
On 01/20/2016 10:28 PM, Quanah Gibson-Mount wrote:
--On Wednesday, January 20, 2016 4:26 PM -0500 Wrolf
wrote:
Is Marc's approach "novel" and "non-obvious"? (Patents must be novel,
non-obvious, and useful.)
I think plenty of people have supplied prior art, and that the
On Wed, 20 Jan 2016 14:48:19 -0800
Marc Perkel wrote:
> To be a little clearer. This new system isn't perfect. And it's main
> strength is identifying good email. It does catch a lot more spam for
> sure but when people scream at me it's because I blocked something
I wonder how this differs from some of the classifiers within CRM114. Several of
them seem to work on phrases (with high costs) or single words.
{^_^}
On 2016-01-20 11:05, Dianne Skoll wrote:
On Wed, 20 Jan 2016 08:52:05 -0800
Marc Perkel wrote:
Suppose I get
It could be challenging if someone impersonated a bank and they did it
right. I'm looking at more aspects than just the content of the message
but that's an area where there is some possible weakness. There are
other tricks to address the specifically. And I am looking at behavior
and headers
And just how well does this work against spearfishing? And would the same magic
list work for ma and pa Kettle well into their 80s only receiving emails from
their children and Freddie Burfle with his heads buried in a corporate accounts
payable office?
{^_^}
On 2016-01-20 08:52, Marc Perkel
Asserting SA as prior art would require some pretty hefty legal fees. From
what I understand the US Patent Office pretty much grants all patents, and
lets the courts work it out. Open source projects do not have deep pockets.
Maybe intervention is needed before a patent is granted.
Wrolf
This observation invites a heretical question. Is nearly perfect spam
classification dangerous compared to merely 99.9%/0.1% accurate classification?
If people get used to no spam do they become more vulnerable to really well
crafted spam?
{o.o}
On 2016-01-20 14:48, Marc Perkel wrote:
It
40 matches
Mail list logo