Re: ADDRESS_IN_SUBJECT et al

RW Fri, 26 Jul 2013 04:48:08 -0700

On Thu, 25 Jul 2013 23:31:57 +0200
Karsten Bräckelmann wrote:


>   Spammy tokens:
>    0.903-+--Fast,
>    0.862-1--33179,
>    0.847-1--Miami,
>    0.847-1--miami

> SPAM:  The spammy tokens are highly suspicious, too. As you confirmed,
> you are manually training these as spam. And all three samples feature
> an address in "Miami, FL 33179" at the bottom.
> 
> Yet, the declassification distance for "33179", "Miami" and
> "miami" (lc version of the former, generated by SA Bayes) is a mere
> 1. Which means, learning the token as the opposite just *once* makes
> them lose the current classification.

 
The threshold for classification is 0.846.  It would be remarkable if
these tokens didn't have a declassification distance of 1.

Re: ADDRESS_IN_SUBJECT et al

Reply via email to