One thing I've wondered/thought about is using the Levenshtein difference between the words in an email and a list of spam words (ideally pulled from the bayes db). In this case, all of the misspelled words in that sample have a L-distance of 1 from the real word -- in other words, they're *very* close.
I think the problem would be that this would consume tons of resources. Anything else, though, would be susceptible to other typo attacks. For instance, say you took each email, and replaced all doubled letters with single letters, it wouldn't be long before you were getting spam advertising "analr bictches" or the like. Chris St. Pierre Unix Systems Administrator Nebraska Wesleyan University On Wed, 4 Oct 2006, Eric A. Hall wrote: > >On 10/4/2006 5:57 PM, Richard Doyle wrote: >> I've been getting lots of porn site spam containing words with doubled >> letters, like this one: > >> Can anybody suggest a rule or ruleset to catch these double-letter >> obfuscations? I'm using Spamassassin 3.1.4. > >You'd probably need to write a plug-in that used some kind of >typo-matching logic to find porno words. > >Would be a good plug-in actually. Get busy :) > >-- >Eric A. Hall http://www.ehsco.com/ >Internet Core Protocols http://www.oreilly.com/catalog/coreprot/ >