Re: Bayes + DCC / Bayes as a false-positive killer

Andrew Talbot Wed, 29 May 2013 12:18:46 -0700

Hi there, RW-

Thank you for your response. A lot of interesting points in there. The
issue with something like Bogofilter or its ilk is that it:
1- Requires manual intervention from users (we don't have access to the
content of their messages)
2- Apparently doesn't scale well to huge client bases with all kinds of
diverse businesses. Our clients range from banking institutions to
employment agencies to ... ehh... purveyors of adult objects. So its tough
to find commonalities, and since we're so large, we can't exactly have
different user accounts for each.


Go figure.

Bayes performs beautifully in my test environment..... I just need to find
that extra "WOW" factor. I thought that saving the cost on DCC would be it
but ... That didn't seem to make a difference. Go figure.


On Wed, May 29, 2013 at 8:02 AM, RW <rwmailli...@googlemail.com> wrote:

> On Tue, 28 May 2013 16:43:20 -0400
> Andrew Talbot wrote:
>
> > Hey all -
> >
> > I've got two questions:
> >
> > 1-
> >
> >...
> > That said, I'm wondering if it's redundant to run DCC and Bayes at
> > the same time? From what I understand, DCC is a subscription-based
> > service, so it would be nice to be able to cut that cost out!
>
> It depends what you mean by DCC, the basic version is free, but is
> actually only a a way of identifying *bulk* mail which is why it
> doesn't score all that much. The paid version is a reputation system, it
> doesn't get discussed much here.
>
> Spamassassin is score-based, it doesn't rely on poison-pill rules. It
> doesn't matter that all DCC hits are also Bayes hits provided that
> the FPs and FNs don't also overlap and some spam that hits Bayes is
> pushed over the 5 point threshold by DCC.
>
>
> > As some of you may have known from talking with me over the past few
> > weeks, I've been having a difficult time 'selling' my bosses on the
> > idea of Bayes; it simply doesn't seem to do anything new to them. But
> > looking at the data today, I came up with an idea: use Bayes to
> > reduce false positives.
> >
> > That would mean we'd completely nerf the rules that add points to the
> > score, but we'd trust Bayes to subtract points from messages it is
> > confident are ham.
> >
> > I am aware of how silly that sounds. But would it work? We don't have
> > another way to filter out false positives - we've got tons of ways to
> > add points!
>
> Reducing FPs is already one of the main benefits of Bayes. The trouble
> is that if you rescore it,  you will still be using the Bayes scoreset
> that's optimized around Bayes doing a lot of the spam catching.
>
> I think you'd be better-off scoring Bogofilter, or a similar filter with
> 3-way clustering, into SpamAssassin. You still have the problem of
> learning representative ham if you want accurate ham identification.
>

Re: Bayes + DCC / Bayes as a false-positive killer

Reply via email to