Feature Requests item #1206807, was opened at 2005-05-23 16:33 Message generated for change (Comment added) made by anadelonbrin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=1206807&group_id=61702
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed Priority: 5 Submitted By: Matt (matthew_levine) Assigned to: Nobody/Anonymous (nobody) Summary: "Trojan text" Initial Comment: Some spam will have long sections of text from random sources, such as excerpts of classic novels or books of quotes, so there will be lots of normal, i.e. hammy, words to get the spam past filters. The spam content will consist of urls and possibly images. An obvious solution would be to search the urls for spam clues, and you already have this as an experimental feature. However, that feature only works for emails that are below a certain threshold of tokens, and the phony text could easily put it over that threshold. So I suggest that either the feature should be able to check urls in all messages, or it could also kick in when some conditions are fulfilled that indicate the likely presence of "Trojan text," such as a high number of ham words along with linked images. Additionally, I suggest that when this feature causes a message to be registered as spam, SpamBayes should not be spam-trained on the "Trojan text," because it was inserted specifically to throw off spam filters, so the filter should work better if it's ignored. ---------------------------------------------------------------------- >Comment By: Tony Meyer (anadelonbrin) Date: 2005-05-23 16:39 Message: Logged In: YES user_id=552329 The experimental (available with 1.0.4 or 1.1a1) URL slurping options do more-or-less what you describe. Please feel free to try them out and suggest any specific improvements to them, and let us know whether they do improve your results or not. Identifying text that doesn't fit with the message is fairly complicated - DSPAM has a "noise" detection algorithm that does this. We may try this at some point. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=1206807&group_id=61702 _______________________________________________ Spambayes-bugs mailing list [email protected] http://mail.python.org/mailman/listinfo/spambayes-bugs
