On Tue, 2014-09-02 at 21:11 -0400, Alex wrote:
> I have a spamassassin-3.4 system with the following bayes config:
> required_hits 5.0
> rbl_timeout 8
> use_bayes 1
> bayes_auto_learn 1
> bayes_auto_learn_on_error 1
> bayes_auto_learn_threshold_spam 9.0
> bayes_expiry_max_db_size 9500000
> bayes_auto_expire 0
> However, spam with scores greater than 9.0 aren't being autolearned:


> Sep  2 21:01:51 mail01 amavis[25938]: (25938-10)
> header_edits_for_quar: <bmu011...@bmu-011.hichina.com> ->
> <bestd...@example.com>, Yes, score=16.519 tag=-200 tag2=5 kill=5
> autolearn_force=no
> I've re-read the autolearn section of the docs,

The one I linked to above?

> and don't see any reason why this 16-point email wouldn't have any new
> tokens to be learned?

Rules with certain tflags are ignored when determining whether a message
should be trained upon. Most notably here BAYES_xx.

Moreover, the auto-learning decision occurs using scores from either
scoreset 0 or 1, that is using scores of a non-Bayes scoreset. IOW the
message's score of 16 is irrelevant, since the auto-learn algorithm uses
different scores per rule.

Next safety net is requiring at least 3 points each from header and body
rules, unless autolearn_force is enabled. Which it is not in your

Either of those could have prevented auto-learning.

Also, according to your wording, you seem to think in terms of (number
of) "new tokens to be learned". Which has nothing in common with

(Even worse, "new tokens" would strongly apply to random gibberish
strings, hapaxes in Bayes context. Which are commonly ignored in Bayes

> I looked in the quarantined message, and according to the _TOKEN_
> header I've added:
> X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16.
> Isn't that sufficient for auto-learning this message as spam?

That has absolutely nothing to do with auto-learning. Where did you get
the impression it might?

> I just wanted to be sure this is just a case of not enough new points
> (tokens?) for the message to be learned, and that I I wasn't doing
> something wrong.

Points: aka score, used in the context of per-rule (per-test) and
overall score classifying a message based on the required_score setting.

Token: think of it as "word" used by the Bayesian classifier sub-system.
In practice, it is more complicated than simply space separated words.
Context (e.x. headers) and case might be taken into account, too.

char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to