Feature Requests item #1120926, was opened at 2005-02-12 06:21
Message generated for change (Comment added) made by anadelonbrin
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=1120926&group_id=61702

Category: None
Group: None
>Status: Closed
Priority: 5
Submitted By: Pete Davis (pdavis68)
Assigned to: Nobody/Anonymous (nobody)
Summary: Character Replacements

Initial Comment:
Spambayes should test character replacements to see if
doing so would produce spam words. For example:

v1c0d|n 

replace the '1' and '|' with 'i' and the '0' with 'o'
and you get vicodin

replace '3' with 'e' and '@' with 'a' and so forth.

In addition, removing whitespace between individual
letters or small letter groups to see if they form
filtered words would also help. for example:

v ! c 0 d1n

Anyway, just a thought.

----------------------------------------------------------------------

>Comment By: Tony Meyer (anadelonbrin)
Date: 2005-02-13 11:49

Message:
Logged In: YES 
user_id=552329

This is a very time-consuming process.  It's possible to
calculate the 'edit distance' for words, and use that as
clues (e.g. one recent paper at the 2005 MIT Spam
Conference), but that is a lot of work.

More to the point - the fact that there is a disguised word
is itself a spam clue.  The chances of getting a 'v1c0d|n'
token in ham is much smaller than getting a 'vicodin' token.
 If the token hasn't been seen before, then it isn't used in
the scoring, and all the rest of the message is used for the
score.

Until use of this technique actually causes any problems,
it's not worth trying to work around it.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=1120926&group_id=61702
_______________________________________________
Spambayes-bugs mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-bugs

Reply via email to