I've made some changes to a few of the unsub/remove URI rules. A diff for the changes is included as an attachment...
The UNSUB_PAGE regexp was: /^https?:\/\/.*(?!cgi).*unsubscribe/i Using (?!cgi) to exclude URIs with a "cgi" in them doesn't work, because the first ".*" can match everything up to the "unsubscribe", including the "cgi", with the second ".*" matching zero chars, in which case (?!cgi) will be true since "uns" is not "cgi". Taken from the Perl regexp documentation: ====== If you are looking for a "bar" that isn't preceded by a "foo", /(?!foo)bar/ will not do what you want. That's because the (?!foo) is just saying that the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will match. You would have to do something like /(?!foo)...bar/ for that. We say "like" because there's the case of your "bar" not having three characters before it. You could cover that this way: /(?:(?!foo)...|^.{0,2})bar/. ======= Fortunatly, the way SA does URI tests, each URI is tested seperatly, so two different regexps can be done on the same URI. Changing the regexp to this: /^https?:\/\/.*unsubscribe/i && !/cgi/i will work. =-=-= The REMOVE_PAGE regexp was /^https?:\/\/[^\/]+\/remove/ This only matches remove URIs if the file portion of the URI starts with "remove", so it won't match these: http://www.xezor.com/removal/remove.htm http://www.chippynet.com/pharmacy/remove.html http://www.chippynet.com/debtfree/remove.html http://www.datawash.com/broadcast/emailremove.asp I changed it so that "remove" can be anywhere in the filename, and also added "removal", "delete" and "optout" to the words to search for. I also added the "no cgi" fix: /^https?:\/\/[^\/]+\/.*(?:remove|removal|delete|opt-?out)/i && !/cgi/i =-=-= For UNSUB_SCRIPT, I also added "remove", "removal", "delete" and "optout": /^https?:\/\/.*cgi.*(?:unsubscribe|remove|removal|delete|opt-?out)/i =-=-= Finally, a new rule to detect URIs with a domain name containing "opt-out" or "optout"; I figure it's only a matter of time before these start popping up. uri OPTOUT_DOMAIN /^https?:\/\/[^\/]*opt-?out/i describe OPTOUT_DOMAIN Domain containing "optout" or "opt-out" -- Visit http://dmoz.org, the world's | Give a man a match, and he'll be warm largest human edited web directory. | for a minute, but set him on fire, and | he'll be warm for the rest of his life. [EMAIL PROTECTED] ICQ: 132152059 |
Index: 20_uri_tests.cf =================================================================== RCS file: /cvsroot/spamassassin/spamassassin/rules/20_uri_tests.cf,v retrieving revision 1.1 diff -u -3 -p -r1.1 20_uri_tests.cf --- 20_uri_tests.cf 5 Mar 2002 17:44:51 -0000 1.1 +++ 20_uri_tests.cf 21 Mar 2002 06:10:51 -0000 @@ -19,10 +19,10 @@ uri HTTP_ESCAPED_HOST /^https?\:\/ describe HTTP_ESCAPED_HOST Uses %-escapes inside a URL's hostname # note: do not match \r or \n -uri HTTP_CTRL_CHARS_HOST /^https\:\/\/[^\/]*[\x00-\x08\x0b\x0c\x0e-\x1f]/ +uri HTTP_CTRL_CHARS_HOST /^https?\:\/\/[^\/]*[\x00-\x08\x0b\x0c\x0e-\x1f]/ describe HTTP_CTRL_CHARS_HOST Uses control sequences inside a URL's hostname -uri PORN_4 /^https:\/\/[\w\.]*(?:xxx|sex|anal|slut|pussy|cum|nympho|suck|porn|hardcore|taboo|whore|voyeur|lesbian|gurlpages|naughty|lolita|teen|schoolgirl|kooloffer|erotic)\w*\./ +uri PORN_4 /^https?:\/\/[\w\.]*(?:xxx|(?<!es)sex|anal|slut|pussy|cum|nympho|suck|porn|hardcore|taboo|whore|voyeur|lesbian|gurlpages|naughty|lolita|teen|schoolgirl|kooloffer|erotic|lust|panty|panties)\w*\./ describe PORN_4 Uses words and phrases which indicate porn (4) # some frequently-advertised URLs @@ -53,14 +53,18 @@ describe WWW_TRAFFICWOW_NET Freq uri WWW_NETSITESFORFREE_NET /netsitesforfree\.net/i describe WWW_NETSITESFORFREE_NET Frequent SPAM content -uri UNSUB_SCRIPT /^https?:\/\/.*cgi.*(unsubscribe|remove)/i -describe UNSUB_SCRIPT URL of CGI script called "unsubscribe" or "remove" +uri OPTOUT_DOMAIN /^https?:\/\/[^\/]*opt-?out/i +describe OPTOUT_DOMAIN Domain containing "optout" or "opt-out" -uri UNSUB_PAGE /^https?:\/\/.*(?!cgi).*unsubscribe/i +uri UNSUB_SCRIPT /^https?:\/\/.*cgi.*(unsubscribe|remove|removal|delete|opt-?out)/i + +describe UNSUB_SCRIPT URL of CGI script called "unsubscribe" or "remove" + +uri UNSUB_PAGE /^https?:\/\/.*unsubscribe/i && !/cgi/i describe UNSUB_PAGE URL of page called "unsubscribe" -uri REMOVE_PAGE /^https?:\/\/[^\/]+\/remove/ -describe REMOVE_PAGE URL of page called "remove" +uri REMOVE_PAGE /^https?:\/\/[^\/]+\/.*(?:remove|removal|delete|opt-?out)/i && !/cgi/i +describe REMOVE_PAGE URI containing "remove", "delete" or "opt-out" uri MAILTO_WITH_SUBJ_REMOVE /^mailto:\S+\?subject=[3D=\s"']*remove/is describe MAILTO_WITH_SUBJ_REMOVE Includes a URL link to send an email with the subject 'remove'