On Sun, 2013-11-10 at 01:59 -0200, Sergio Durigan Junior wrote:
> Nice, thanks both of you for the answers.
> 
> I am now feeding SA with ham from my INBOX, while I also feed it with
> false-negatives (interestingly, I am receiving now *much* more spam than
> I was a week ago...).

Given what you stated about your spam volume before, entirely possible.
However, you're not using catch-all, do you?

> So, I now have yet another question.  I let auto_learn active for SA,
> and now for every false-negative SA will learn that it is not spam,

No. False negative (not classified spam, although it is) is NOT what
triggers auto-learn ham.

> although it is.  I'm now thinking that maybe auto_learn is not a good
> idea, at least until I have a good enough Bayes database (strangely, SA
> did not catch *any* spam in the last 48 hours...).  Can you confirm
> this?
> 
> Thanks a lot, and sorry if I'm asking too much :-).

Just leave auto-learn enabled. And, yet again, do train both ham and
spam (all, not only mis-classified messages) for initial training.


Auto-learning in SA Bayes is much more than a pure feedback loop, as you
described. A message just being classified ham (< 5.0) is NOT learned as
ham. Neither are messages scored spam (>= 5.0) learned as spam.

(1) The thresholds for auto-learning are 0.1 and 12.0 by default. Not
    the required_score threshold of 5.0 default.
(2) Certain rules are not considered for auto-learning, to prevent self-
    feeding.
(3) A minimum of header and body rules are required, to prevent biasing.

See M::SA::Plugin::AutoLearnThreshold docs for more details.

Part of the X-Spam-Status header way down the end tells you about SA
auto-learning or not. Hardly surprising, that's
  autolearn=(ham|spam|no|unavailable)


In your case, I'd say just let SA do it's job. Monitor the results, and
train both ham and spam, at the very least until BAYES_xx rules show up
in X-Spam-Status headers.

Keep training Bayes after that, to improve performance. Definitely do
train on false positives and negatives.

Wait, observe, and learn how to read X-Spam headers. :)


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to