> At the risk of bloating this awesome piece of software, I submit > that Grammar and Spell checking (at least localized English) would > be a good way to easily identify illegitimate email.
Grammar checking is difficult, but various methods of generating tokens based on spell-checking have been evaluated in the past, and found to be ineffective. For example: [ 817813 ] Consider bad spelling a sign of spam http://sourceforge.net/tracker/index.php? func=detail&aid=817813&group_id=61702&atid=498106 I suspect that the problems with this include: * Many people 'misspell' words in legitimate email (abbreviations, slang, proper nouns, typos, and so on) * Spam that tries to hide behind misspelled words is generally already caught; it is other spam (e.g. image-based) that really causes problems these days. This is perhaps a more corpus-dependent feature than others - for example, I suspect that on a primarily business-orientated email stream the results would be somewhat better (since work email tends to be better spelt, although there are certainly plenty of exceptions to that rule). I haven't done any tests, but my expectation would be that grammar checking would be even worse, since few English-as-a-first-language speakers have any idea of what correct English grammar is. (I expect that, for example, comma splices and incomplete sentences would be just as common in ham as in spam). =Tony.Meyer -- Please always include the list (spambayes at python.org) in your replies (reply-all), and please don't send me personal mail about SpamBayes. http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
