Re: ADDRESS_IN_SUBJECT et al

Karsten Bräckelmann Fri, 26 Jul 2013 15:36:21 -0700

On Fri, 2013-07-26 at 12:46 +0100, RW wrote:
> On Thu, 25 Jul 2013 23:31:57 +0200 Karsten Bräckelmann wrote:


> > SPAM:  The spammy tokens are highly suspicious, too. As you confirmed,
> > you are manually training these as spam. And all three samples feature
> > an address in "Miami, FL 33179" at the bottom.
> > 
> > Yet, the declassification distance for "33179", "Miami" and
> > "miami" (lc version of the former, generated by SA Bayes) is a mere
> > 1. Which means, learning the token as the opposite just *once* makes
> > them lose the current classification.
>  
> The threshold for classification is 0.846.  It would be remarkable if
> these tokens didn't have a declassification distance of 1.

That's not the point, though. With correct manual training the
declassification distance should be higher. And frankly, there should be
more than 4 spammy tokens at all.

Caveat: Assuming, to gather these Bayes Token headers, SA has been run
as the same user it's processing incoming mail.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: ADDRESS_IN_SUBJECT et al

Reply via email to