On Fri, 2013-07-26 at 12:46 +0100, RW wrote: > On Thu, 25 Jul 2013 23:31:57 +0200 Karsten Bräckelmann wrote:
> > SPAM: The spammy tokens are highly suspicious, too. As you confirmed, > > you are manually training these as spam. And all three samples feature > > an address in "Miami, FL 33179" at the bottom. > > > > Yet, the declassification distance for "33179", "Miami" and > > "miami" (lc version of the former, generated by SA Bayes) is a mere > > 1. Which means, learning the token as the opposite just *once* makes > > them lose the current classification. > > The threshold for classification is 0.846. It would be remarkable if > these tokens didn't have a declassification distance of 1. That's not the point, though. With correct manual training the declassification distance should be higher. And frankly, there should be more than 4 spammy tokens at all. Caveat: Assuming, to gather these Bayes Token headers, SA has been run as the same user it's processing incoming mail. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}