I've made some changes to a few of the unsub/remove URI rules.  A diff for
the changes is included as an attachment...

The UNSUB_PAGE regexp was:

    /^https?:\/\/.*(?!cgi).*unsubscribe/i

Using (?!cgi) to exclude URIs with a "cgi" in them doesn't work, because
the first ".*" can match everything up to the "unsubscribe", including the
"cgi", with the second ".*" matching zero chars, in which case (?!cgi) will
be true since "uns" is not "cgi".  Taken from the Perl regexp documentation:

======

If you are looking for a "bar" that isn't preceded by a "foo",
/(?!foo)bar/ will not do what you want. That's because the (?!foo) is
just saying that the next thing cannot be "foo"--and it's not, it's a
"bar", so "foobar" will match. You would have to do something like
/(?!foo)...bar/ for that. We say "like" because there's the case of
your "bar" not having three characters before it. You could cover that
this way: /(?:(?!foo)...|^.{0,2})bar/.

=======

Fortunatly, the way SA does URI tests, each URI is tested seperatly, so
two different regexps can be done on the same URI.  Changing the regexp
to this:

    /^https?:\/\/.*unsubscribe/i && !/cgi/i

will work.

=-=-=

The REMOVE_PAGE regexp was

    /^https?:\/\/[^\/]+\/remove/

This only matches remove URIs if the file portion of the URI starts with
"remove", so it won't match these:

http://www.xezor.com/removal/remove.htm
http://www.chippynet.com/pharmacy/remove.html
http://www.chippynet.com/debtfree/remove.html
http://www.datawash.com/broadcast/emailremove.asp

I changed it so that "remove" can be anywhere in the filename, and also added
"removal", "delete" and "optout" to the words to search for.  I also added the
"no cgi" fix:

    /^https?:\/\/[^\/]+\/.*(?:remove|removal|delete|opt-?out)/i && !/cgi/i

=-=-=

For UNSUB_SCRIPT, I also added "remove", "removal", "delete" and "optout":

    /^https?:\/\/.*cgi.*(?:unsubscribe|remove|removal|delete|opt-?out)/i

=-=-=

Finally, a new rule to detect URIs with a domain name containing "opt-out" or
"optout"; I figure it's only a matter of time before these start popping up.

uri      OPTOUT_DOMAIN  /^https?:\/\/[^\/]*opt-?out/i
describe OPTOUT_DOMAIN  Domain containing "optout" or "opt-out"

-- 
Visit http://dmoz.org, the world's   | Give a man a match, and he'll be warm
largest human edited web directory.  | for a minute, but set him on fire, and
                                     | he'll be warm for the rest of his life.
[EMAIL PROTECTED]  ICQ: 132152059 |
Index: 20_uri_tests.cf
===================================================================
RCS file: /cvsroot/spamassassin/spamassassin/rules/20_uri_tests.cf,v
retrieving revision 1.1
diff -u -3 -p -r1.1 20_uri_tests.cf
--- 20_uri_tests.cf	5 Mar 2002 17:44:51 -0000	1.1
+++ 20_uri_tests.cf	21 Mar 2002 06:10:51 -0000
@@ -19,10 +19,10 @@ uri HTTP_ESCAPED_HOST       /^https?\:\/
 describe HTTP_ESCAPED_HOST      Uses %-escapes inside a URL's hostname
 
 # note: do not match \r or \n
-uri HTTP_CTRL_CHARS_HOST    /^https\:\/\/[^\/]*[\x00-\x08\x0b\x0c\x0e-\x1f]/
+uri HTTP_CTRL_CHARS_HOST    /^https?\:\/\/[^\/]*[\x00-\x08\x0b\x0c\x0e-\x1f]/
 describe HTTP_CTRL_CHARS_HOST   Uses control sequences inside a URL's hostname
 
-uri PORN_4  /^https:\/\/[\w\.]*(?:xxx|sex|anal|slut|pussy|cum|nympho|suck|porn|hardcore|taboo|whore|voyeur|lesbian|gurlpages|naughty|lolita|teen|schoolgirl|kooloffer|erotic)\w*\./
+uri PORN_4  /^https?:\/\/[\w\.]*(?:xxx|(?<!es)sex|anal|slut|pussy|cum|nympho|suck|porn|hardcore|taboo|whore|voyeur|lesbian|gurlpages|naughty|lolita|teen|schoolgirl|kooloffer|erotic|lust|panty|panties)\w*\./
 describe PORN_4         Uses words and phrases which indicate porn (4)
 
 # some frequently-advertised URLs
@@ -53,14 +53,18 @@ describe WWW_TRAFFICWOW_NET         Freq
 uri WWW_NETSITESFORFREE_NET     /netsitesforfree\.net/i
 describe WWW_NETSITESFORFREE_NET    Frequent SPAM content
 
-uri UNSUB_SCRIPT        /^https?:\/\/.*cgi.*(unsubscribe|remove)/i
-describe UNSUB_SCRIPT       URL of CGI script called "unsubscribe" or "remove"
+uri      OPTOUT_DOMAIN  /^https?:\/\/[^\/]*opt-?out/i
+describe OPTOUT_DOMAIN  Domain containing "optout" or "opt-out"
 
-uri UNSUB_PAGE      /^https?:\/\/.*(?!cgi).*unsubscribe/i
+uri      UNSUB_SCRIPT   /^https?:\/\/.*cgi.*(unsubscribe|remove|removal|delete|opt-?out)/i
+
+describe UNSUB_SCRIPT   URL of CGI script called "unsubscribe" or "remove"
+
+uri      UNSUB_PAGE     /^https?:\/\/.*unsubscribe/i && !/cgi/i
 describe UNSUB_PAGE     URL of page called "unsubscribe"
 
-uri REMOVE_PAGE     /^https?:\/\/[^\/]+\/remove/
-describe REMOVE_PAGE        URL of page called "remove"
+uri      REMOVE_PAGE    /^https?:\/\/[^\/]+\/.*(?:remove|removal|delete|opt-?out)/i && !/cgi/i
+describe REMOVE_PAGE    URI containing "remove", "delete" or "opt-out"
 
 uri MAILTO_WITH_SUBJ_REMOVE /^mailto:\S+\?subject=[3D=\s";']*remove/is
 describe MAILTO_WITH_SUBJ_REMOVE Includes a URL link to send an email with the subject 'remove'

Reply via email to