On Sun, 2013-07-28 at 23:12 -0600, Amir 'CG' Caspi wrote: > So, some of my FNs get autolearned as ham, and because of the > way my mail queue is set up, I typically only see this once the mail > reaches my MUA and has already been deleted from the online inbox. I > have one particular message that got autolearned as ham (but should > be spam), and I'm trying to run it through sa-learn --forget... but > it's not forgetting anything. (It tells me "Forgot tokens from 0 > message(s) (1 message(s) examined)".) > It would appear that something got changed between when SA > autolearned this message as ham, and when my MUA processed it. > > Is there any way I can get sa-learn to forget this message, by > forcing the message-ID or something? Or, am I basically stuck and my > Bayes DB has now been poisoned by this mis-learned email that I can't > forget?
Poison is a little extreme. I'm sure your Bayes DB will be fine and recover quickly. You cannot force the ID -- well, not the correct one at least, because that would require having the original message. You could, however, modify the source code to try to forget each Bayes token unconditionally, I guess. Probably too much work, though. You're best bet is to just train what you have as spam, to counter the ham rating. Bayes works on tokens (think of it as words) not a complete message, so this should work if the modification didn't severely harm the message. You do not have to --forget a mis-trained message anyway, unless simply reverting the training is what you want. If you want to correct the auto-learn and train as --spam, SA will automatically imply the forget step, if the message has been seen (trained) before. (Of course, that would not have surfaced the issue of something rewriting your mail in between.) > I'll note that this doesn't always happen... sometimes sa-learn can > forget mail that I paste in from my MUA. This time, it's not working. ^^^^^ What do you mean, "paste"!? SA always needs a full, raw message, including all headers, alternative parts, and attachments if any. And in particular regarding the sa-learn message ID, almost every bit counts. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}