Feature Requests item #1242708, was opened at 2005-07-21 17:11 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=1242708&group_id=61702
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Priority: 5 Submitted By: Mark Storer (mstorer3772) Assigned to: Nobody/Anonymous (nobody) Summary: Counter-counter-spam filtering suggestions Initial Comment: My experience is that the majority of spam that gets around filteration involves lots of deliberate misspellings, either by add1ng or ins^ertin*g non-le++er [EMAIL PROTECTED], thro wing in sp aces wher e t hey do n't belon g, or ByUsingTitleCaseToSeperateWordsRatherThanSpaces. Ditching spaces There are several possible workarounds to this: 1) Drop all non-letters and spaces, evaluating the resulting monolithic string. Downside: More compulationally expensive, as the list of possibly matches increases dramatically for each segment of the monolith, and you have to test each segment against multiple lengths. O(n^2) might be generous. 2) Attempt to merge adjacent tokens to see if they qualify as spam (or ham I suppose). This sounds more like a O(n) operation, but would only stamp out the "additional spaces" method. Downside: Again, more CPU time, but to a lesser extent than #1 Defeated by not using the "add spaces" technique. 3) Treat all new words as having a low positve spam rating of some sort. Each newly encountered misspelling would be initially biased towards spam. 4) Add a spelling checker. New misspelled words have a slightly-spam rating (outside training). Downside: Big data file tacked onto your otherwise light-weight plugin/app/thingy. One concern with #3 and #4 is how they would react to an email containing source code of whatever language. Variable and function names are infrequently found in a dictionary (as you're no doubt aware). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=1242708&group_id=61702 _______________________________________________ Spambayes-bugs mailing list [email protected] http://mail.python.org/mailman/listinfo/spambayes-bugs
