On Sun, 2013-11-10 at 01:59 -0200, Sergio Durigan Junior wrote: > Nice, thanks both of you for the answers. > > I am now feeding SA with ham from my INBOX, while I also feed it with > false-negatives (interestingly, I am receiving now *much* more spam than > I was a week ago...).
Given what you stated about your spam volume before, entirely possible. However, you're not using catch-all, do you? > So, I now have yet another question. I let auto_learn active for SA, > and now for every false-negative SA will learn that it is not spam, No. False negative (not classified spam, although it is) is NOT what triggers auto-learn ham. > although it is. I'm now thinking that maybe auto_learn is not a good > idea, at least until I have a good enough Bayes database (strangely, SA > did not catch *any* spam in the last 48 hours...). Can you confirm > this? > > Thanks a lot, and sorry if I'm asking too much :-). Just leave auto-learn enabled. And, yet again, do train both ham and spam (all, not only mis-classified messages) for initial training. Auto-learning in SA Bayes is much more than a pure feedback loop, as you described. A message just being classified ham (< 5.0) is NOT learned as ham. Neither are messages scored spam (>= 5.0) learned as spam. (1) The thresholds for auto-learning are 0.1 and 12.0 by default. Not the required_score threshold of 5.0 default. (2) Certain rules are not considered for auto-learning, to prevent self- feeding. (3) A minimum of header and body rules are required, to prevent biasing. See M::SA::Plugin::AutoLearnThreshold docs for more details. Part of the X-Spam-Status header way down the end tells you about SA auto-learning or not. Hardly surprising, that's autolearn=(ham|spam|no|unavailable) In your case, I'd say just let SA do it's job. Monitor the results, and train both ham and spam, at the very least until BAYES_xx rules show up in X-Spam-Status headers. Keep training Bayes after that, to improve performance. Definitely do train on false positives and negatives. Wait, observe, and learn how to read X-Spam headers. :) -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}