Feature Requests item #1482685, was opened at 2006-05-06 06:45 Message generated for change (Comment added) made by anadelonbrin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=1482685&group_id=61702
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed Priority: 5 Submitted By: Alan Arndt (a_arndt) Assigned to: Nobody/Anonymous (nobody) Summary: Importance of junk words Initial Comment: I have done some reading through the forum but I haven't seen a discussion regarding the extra or junk words so many spam messages come with these days. They are of course a blatant attempt to bypass spam filters. To that end I think there should be a change, or perhaps user configurable bias, as to how "new" tokens are ranked in comparision to spam or ham. I certainly haven't done any studies but I know my database contains most of the words I've sent or expect to receive in an e-mail. Any e-mail that contains a large number of new/unknown words is almost certainly a spam. Some way to place a higher weighting on new/unknown words would seem to be welcome. ---------------------------------------------------------------------- >Comment By: Tony Meyer (anadelonbrin) Date: 2006-05-06 08:44 Message: Logged In: YES user_id=552329 The option you are referring to is unknown_word_prob. Studies have shown that 'word salad' is more likely to result in the message being recognised as spam than ham. The FAQ explains how to ask for help if you are not getting satisfactory results (the issue is most likely one of appropriate training). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=1482685&group_id=61702 _______________________________________________ Spambayes-bugs mailing list [email protected] http://mail.python.org/mailman/listinfo/spambayes-bugs
