On Wed, 2014-09-03 at 23:50 -0400, Alex wrote: > > > I looked in the quarantined message, and according to the _TOKEN_ > > > header I've added: > > > > > > X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16. > > > > > > Isn't that sufficient for auto-learning this message as spam? ^^^^ That's clearly referring to the _TOKEN_ data in the custom header, is it not?
> > That has absolutely nothing to do with auto-learning. Where did you get > > the impression it might? > > If the conditions for autolearning had been met, I understood that it > would be those new tokens that would be learned. Learning is not limited to new tokens. All tokens are learned, regardless their current (h|sp)ammyness. Still, the number of (new) tokens is not a condition for auto-learning. That header shows some more or less nice information, but in this context absolutely irrelevant information. Auto-learning in a nutshell: Take all tests hit. Drop some of them with certain tflags, like the BAYES_xx rules. For the remaining rules, look up their scores in the non-Bayes scoreset 0 or 1. Sum up those scores to a total, and compare with the auto-learn threshold values. For spam, also check there are at least 3 points each by header and body rules. Finally, if all that matches, learn. > Okay, of course I understood the difference between points and tokens. > Since the points were over the specified threshold, I thought those > new tokens would have been added. As I have mentioned before in this thread: It is NOT the message's reported total score that must exceed the threshold. The auto-learning discriminator uses an internally calculated score using the respective non-Bayes scoreset. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}