Re: Bayes autolearn questions

Alex Sat, 06 Sep 2014 14:23:33 -0700

Hi,

On Thu, Sep 4, 2014 at 1:44 PM, Karsten Bräckelmann <guent...@rudersport.de>
wrote:


> On Wed, 2014-09-03 at 23:50 -0400, Alex wrote:
>
> > > > I looked in the quarantined message, and according to the _TOKEN_
> > > > header I've added:
> > > >
> > > > X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16.
> > > >
> > > > Isn't that sufficient for auto-learning this message as spam?
>             ^^^^
> That's clearly referring to the _TOKEN_ data in the custom header, is it
> not?
>

Yes. Burning the candle at both ends. Really overworked.


> > > That has absolutely nothing to do with auto-learning. Where did you get
> > > the impression it might?
> >
> > If the conditions for autolearning had been met, I understood that it
> > would be those new tokens that would be learned.
>
> Learning is not limited to new tokens. All tokens are learned,
> regardless their current (h|sp)ammyness.
>
> Still, the number of (new) tokens is not a condition for auto-learning.
> That header shows some more or less nice information, but in this
> context absolutely irrelevant information.
>

I understood "new" to mean the tokens that have not been seen before, and
would be learned if the other conditions were met.


> Auto-learning in a nutshell: Take all tests hit. Drop some of them with
> certain tflags, like the BAYES_xx rules. For the remaining rules, look
> up their scores in the non-Bayes scoreset 0 or 1. Sum up those scores to
> a total, and compare with the auto-learn threshold values. For spam,
> also check there are at least 3 points each by header and body rules.
> Finally, if all that matches, learn.
>

Is it important to understand how those three points are achieved or
calculated?


> > Okay, of course I understood the difference between points and tokens.
> > Since the points were over the specified threshold, I thought those
> > new tokens would have been added.
>
> As I have mentioned before in this thread: It is NOT the message's
> reported total score that must exceed the threshold. The auto-learning
> discriminator uses an internally calculated score using the respective
> non-Bayes scoreset.
>

Very helpful, thanks. Is there a way to see more about how it makes that
decision on a particular message?

Thanks,
Alex

Re: Bayes autolearn questions

Reply via email to