>From: RW <rwmailli...@googlemail.com>
>Sent: Tuesday, May 31, 2016 5:20 PM
>To: users@spamassassin.apache.org
>Subject: Re: SA Concepts - plugin for email semantics
>On Tue, 31 May 2016 15:20:56 -0400
>Bill Cole wrote:
>> On 29 May 2016, at 11:07, RW wrote
On Tue, 31 May 2016 15:20:56 -0400
Bill Cole wrote:
> On 29 May 2016, at 11:07, RW wrote:
>
> > Statistical filters are based on some statistical theory combined
> > with pragmatic kludges and assumptions. Practical filters have been
> > developed based on what's been found to work, not on
On Tue, 31 May 2016 21:23:11 +0100
Paul Stead wrote:
> The implementation was undertaken from a personal interest - I asked
> the question of what people thought of the implementation and the
> impact to Bayes DB.
I think what the "concepts" concept ends up doing
On 31/05/16 20:20, Bill Cole wrote:
It is no shock that while this implementation has Paul Stead's name on
it, it is apparently mostly the product of the anti-spam community's
most spectacular case of Dunning-Kruger Syndrome, who has apparently
figured out that his personal 'brand' has
On 29 May 2016, at 11:07, RW wrote:
On Sat, 28 May 2016 15:37:21 -0400
Bill Cole wrote:
More importantly (IMHO) they aren't designed to collide with existing
common tokens and be added back into messages that may contain those
tokens already in order to influence Bayesian classification.
On Tue, 31 May 2016 12:05:39 -0400
Bill Cole wrote:
> On 31 May 2016, at 2:21, Henrik K wrote:
>
> > On Mon, May 30, 2016 at 06:25:08PM -0400, Dianne Skoll wrote:
> >> On Mon, 30 May 2016 17:45:52 -0400
> >> "Bill Cole" wrote:
> >>
> >>> So you could
On Mon, 30 May 2016 17:45:52 -0400
Bill Cole wrote:
> The "Naive Bayes" classification approach is theoretically moored to
> Bayes' Theorem
FWIW Bayes hasn't been "Naive Bayes" for a long time.
On 31 May 2016, at 2:21, Henrik K wrote:
On Mon, May 30, 2016 at 06:25:08PM -0400, Dianne Skoll wrote:
On Mon, 30 May 2016 17:45:52 -0400
"Bill Cole" wrote:
So you could have 'sex' and 'meds' and 'watches' tallied up in into
frequency counts that sum
Am 31.05.2016 um 02:30 schrieb Bill Cole:
On 30 May 2016, at 18:25, Dianne Skoll wrote:
On Mon, 30 May 2016 17:45:52 -0400
"Bill Cole" wrote:
So you could have 'sex' and 'meds' and 'watches' tallied up in into
frequency counts that sum up natural
On Mon, May 30, 2016 at 06:25:08PM -0400, Dianne Skoll wrote:
> On Mon, 30 May 2016 17:45:52 -0400
> "Bill Cole" wrote:
>
> > So you could have 'sex' and 'meds' and 'watches' tallied up in into
> > frequency counts that sum up natural (word) and synthetic
On 30 May 2016, at 18:25, Dianne Skoll wrote:
On Mon, 30 May 2016 17:45:52 -0400
"Bill Cole" wrote:
So you could have 'sex' and 'meds' and 'watches' tallied up in into
frequency counts that sum up natural (word) and synthetic (concept)
occurrences,
On Mon, 30 May 2016 17:45:52 -0400
"Bill Cole" wrote:
> So you could have 'sex' and 'meds' and 'watches' tallied up in into
> frequency counts that sum up natural (word) and synthetic (concept)
> occurrences, not just as incompatible types of input
On 28 May 2016, at 17:53, John Hardin wrote:
Based on that, do you have an opinion on the proposal to add two-word
(or configurable-length) combinations to Bayes?
CAVEAT: it has literally been decades since I've worked deep in
statistics on a routine basis rather than just using blindly
Am 29.05.2016 um 02:46 schrieb Dianne Skoll:
And also, two-word phrases can be stronger indicators than the
individual words; "hot" and "sex" in isolation may not be strong spam
indicators, but "hot sex" probably is stronger.
Going from one-word tokens to one+two-word tokens will have a
On Sat, 28 May 2016 15:37:21 -0400
Bill Cole wrote:
> More importantly (IMHO) they aren't designed to collide with existing
> common tokens and be added back into messages that may contain those
> tokens already in order to influence Bayesian classification.
>
> There is sound statistical
On Sat, 28 May 2016 14:53:15 -0700 (PDT)
John Hardin wrote:
> Based on that, do you have an opinion on the proposal to add two-word
> (or configurable-length) combinations to Bayes?
I have an opinion. :)
Extending Bayes to look at multiple tokens is a *very* good idea.
On Sat, 28 May 2016, Bill Cole wrote:
There is sound statistical theory consistent with empirical evidence
underpinning the Bayes classifier implementation in SA. While there can be
legitimate critiques of the SA implementation specifically and in general how
well email word frequency fits
On 25 May 2016, at 13:15, Dianne Skoll wrote:
On Wed, 25 May 2016 18:10:57 +0100
Paul Stead wrote:
[quoting Dianne]
"Concepts" is a lossy process. You are throwing away information.
That is by design, similar to fingerprinting emails with iXhash or
Razor.
On Thu, 26 May 2016 12:20:35 +0200
Matus UHLAR - fantomas wrote:
you apparently mistook razor to DCC, the DCC is here to measure
bulkiness, but not (necessarily) spamminess.
On 26.05.16 09:46, Dianne Skoll wrote:
Yes, you are correct. Thanks for the clarification!
And
On Thu, 26 May 2016 12:20:35 +0200
Matus UHLAR - fantomas wrote:
> you apparently mistook razor to DCC, the DCC is here to measure
> bulkiness, but not (necessarily) spamminess.
Yes, you are correct. Thanks for the clarification!
And also, just to clarify another thing:
On Wed, 25 May 2016 18:10:57 +0100
Paul Stead wrote:
> > Yes, except here's the problem. A drug company might legitimately
> > talk about Viagra, so that wouldn't be a spam token. V1agra almost
> > certainly would be a spam token. Bayes can distinguish between
On 25/05/16 15:21, Dianne Skoll wrote:
On Wed, 25 May 2016 15:07:37 +0100
Paul Stead wrote:
Consider the following 2 basic emails:
Mail 1:
Viagra
Mail 2:
V1agra
Yes, except here's the problem. A drug company might legitimately
talk about Viagra, so that
On Wed, 25 May 2016 15:07:37 +0100
Paul Stead wrote:
> Consider the following 2 basic emails:
> Mail 1:
> Viagra
> Mail 2:
> V1agra
Yes, except here's the problem. A drug company might legitimately
talk about Viagra, so that wouldn't be a spam token. V1agra
It may come down to my understanding of Bayes and its tokens.. Also
having a bit a problem explaining this concept on paper...
I see this as adding an extra layer to the Bayes:
Consider the following 2 basic emails:
Mail 1:
Viagra
Mail 2:
V1agra
With Bayes:
Mail 1:
Mail 2:
With
>
> With David's help I have tracked down the problem(s). Version 0.02 is
> up. Would be interested to hear you thoughts - even if just theoretical
> about the affect to the Bayes DB.
Just in theory, i am curious what part of the Bayes filter you hope to
improve? I think you are not adding any
On 24/05/16 17:09, David Jones wrote:
Good idea. I would like to test this out so I put this on my CentOS 6 servers
(perl v5.10.1) and got this:
May 24 10:59:51.850 [30158] warn: plugin: failed to parse plugin
/etc/mail/spamassassin/Concepts.pm: Type of arg 1 to push must be array (not
>From: Paul Stead
>Sent: Tuesday, May 24, 2016 9:55 AM
>To: users@spamassassin.apache.org
>Subject: SA Concepts - plugin for email semantics
>Hi guys,
>Based upon some information from others on the list I have put together
>a plugin for SA which canonicalises an
27 matches
Mail list logo