On Sun, 2013-07-28 at 23:12 -0600, Amir 'CG' Caspi wrote:
>       So, some of my FNs get autolearned as ham, and because of the 
> way my mail queue is set up, I typically only see this once the mail 
> reaches my MUA and has already been deleted from the online inbox.  I 
> have one particular message that got autolearned as ham (but should 
> be spam), and I'm trying to run it through sa-learn --forget... but 
> it's not forgetting anything.  (It tells me "Forgot tokens from 0 
> message(s) (1 message(s) examined)".)
>       It would appear that something got changed between when SA 
> autolearned this message as ham, and when my MUA processed it.
> 
> Is there any way I can get sa-learn to forget this message, by 
> forcing the message-ID or something?  Or, am I basically stuck and my 
> Bayes DB has now been poisoned by this mis-learned email that I can't 
> forget?

Poison is a little extreme. I'm sure your Bayes DB will be fine and
recover quickly.

You cannot force the ID -- well, not the correct one at least, because
that would require having the original message. You could, however,
modify the source code to try to forget each Bayes token
unconditionally, I guess. Probably too much work, though.

You're best bet is to just train what you have as spam, to counter the
ham rating. Bayes works on tokens (think of it as words) not a complete
message, so this should work if the modification didn't severely harm
the message.

You do not have to --forget a mis-trained message anyway, unless simply
reverting the training is what you want. If you want to correct the
auto-learn and train as --spam, SA will automatically imply the forget
step, if the message has been seen (trained) before. (Of course, that
would not have surfaced the issue of something rewriting your mail in
between.)


> I'll note that this doesn't always happen... sometimes sa-learn can 
> forget mail that I paste in from my MUA.  This time, it's not working.
                     ^^^^^
What do you mean, "paste"!?

SA always needs a full, raw message, including all headers, alternative
parts, and attachments if any. And in particular regarding the sa-learn
message ID, almost every bit counts.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to