Feature Requests item #1120926, was opened at 2005-02-12 06:21 Message generated for change (Comment added) made by anadelonbrin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=1120926&group_id=61702
Category: None Group: None >Status: Closed Priority: 5 Submitted By: Pete Davis (pdavis68) Assigned to: Nobody/Anonymous (nobody) Summary: Character Replacements Initial Comment: Spambayes should test character replacements to see if doing so would produce spam words. For example: v1c0d|n replace the '1' and '|' with 'i' and the '0' with 'o' and you get vicodin replace '3' with 'e' and '@' with 'a' and so forth. In addition, removing whitespace between individual letters or small letter groups to see if they form filtered words would also help. for example: v ! c 0 d1n Anyway, just a thought. ---------------------------------------------------------------------- >Comment By: Tony Meyer (anadelonbrin) Date: 2005-02-13 11:49 Message: Logged In: YES user_id=552329 This is a very time-consuming process. It's possible to calculate the 'edit distance' for words, and use that as clues (e.g. one recent paper at the 2005 MIT Spam Conference), but that is a lot of work. More to the point - the fact that there is a disguised word is itself a spam clue. The chances of getting a 'v1c0d|n' token in ham is much smaller than getting a 'vicodin' token. If the token hasn't been seen before, then it isn't used in the scoring, and all the rest of the message is used for the score. Until use of this technique actually causes any problems, it's not worth trying to work around it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=1120926&group_id=61702 _______________________________________________ Spambayes-bugs mailing list [email protected] http://mail.python.org/mailman/listinfo/spambayes-bugs
