On Thu, 25 Jul 2013 23:31:57 +0200 Karsten Bräckelmann wrote:
> Spammy tokens: > 0.903-+--Fast, > 0.862-1--33179, > 0.847-1--Miami, > 0.847-1--miami > SPAM: The spammy tokens are highly suspicious, too. As you confirmed, > you are manually training these as spam. And all three samples feature > an address in "Miami, FL 33179" at the bottom. > > Yet, the declassification distance for "33179", "Miami" and > "miami" (lc version of the former, generated by SA Bayes) is a mere > 1. Which means, learning the token as the opposite just *once* makes > them lose the current classification. The threshold for classification is 0.846. It would be remarkable if these tokens didn't have a declassification distance of 1.