[Tony Meyer, last week] > The latter was prompted by a comment in JGC's latest > newsletter (though I'm sure I've seen this somewhere before, > too). To avoid deliberate misspellings and the so-called > 'cambridge effect' you replace each (or generate a new) token > that is made up of the letters in the original token sorted > into a constant order (e.g. alphabetical). So "god" becomes > "dgo", but so does "dog".
At the MIT Spam Conference John mentioned (offhand, regarding something else) that POPFile does this just for words that are longer than 6 characters. Since I already had the stuff at hand, I gave this a go, in case the poor results were just from those short words. Compared to all-defaults, fp and fn were unchanged and unsure rose 0.03%. So the verdict is unchanged. (I can post cmp.py or table.py results if anyone is interested, but there's nothing really interesting here). =Tony.Meyer _______________________________________________ spambayes-dev mailing list [email protected] http://mail.python.org/mailman/listinfo/spambayes-dev
