On Wed, 2014-09-03 at 23:50 -0400, Alex wrote:

> > > I looked in the quarantined message, and according to the _TOKEN_
> > > header I've added:
> > > 
> > > X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16.
> > > 
> > > Isn't that sufficient for auto-learning this message as spam?
            ^^^^
That's clearly referring to the _TOKEN_ data in the custom header, is it
not?

> > That has absolutely nothing to do with auto-learning. Where did you get
> > the impression it might?
> 
> If the conditions for autolearning had been met, I understood that it
> would be those new tokens that would be learned.

Learning is not limited to new tokens. All tokens are learned,
regardless their current (h|sp)ammyness.

Still, the number of (new) tokens is not a condition for auto-learning.
That header shows some more or less nice information, but in this
context absolutely irrelevant information.


Auto-learning in a nutshell: Take all tests hit. Drop some of them with
certain tflags, like the BAYES_xx rules. For the remaining rules, look
up their scores in the non-Bayes scoreset 0 or 1. Sum up those scores to
a total, and compare with the auto-learn threshold values. For spam,
also check there are at least 3 points each by header and body rules.
Finally, if all that matches, learn.


> Okay, of course I understood the difference between points and tokens.
> Since the points were over the specified threshold, I thought those
> new tokens would have been added.

As I have mentioned before in this thread: It is NOT the message's
reported total score that must exceed the threshold. The auto-learning
discriminator uses an internally calculated score using the respective
non-Bayes scoreset.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to