Patches item #1532862, was opened at 2006-08-01 21:14 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=1532862&group_id=61702
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Nobody/Anonymous (nobody) Summary: Count runs of short 'words' Initial Comment: I don't believe I submitted this before. A common spam technique of relatively recent vintage is to spell spam words with embedded spaces. In the case of SpamBayes at least, they are thus skipped. This patch generates tokens based on the longest such run seen in a message. At the moment it seems to be not much help: token,nspam,nham,spam prob short:5,0,1,0.155172413793 short:4,1,2,0.158641753503 short:3,6,2,0.5 short:2,16,6,0.393162750975 short:1,138,31,0.5 short:0,52,9,0.5 but I seem to recall that when I first tried it, it helped. Including here for completeness in case someone wants to test it out. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=1532862&group_id=61702 _______________________________________________ Spambayes-bugs mailing list [email protected] http://mail.python.org/mailman/listinfo/spambayes-bugs
