Thanks for the rules fodder! BTW, msn also has an open redirector that is seeing much use:
uri LWTEST_REDIRECT1 m'http://g.msn.com/0AD0000[A-Z]/\d{6}\.1[/\?]'i describe LWTEST_REDIRECT1 Open MSN redirector found in URL Loren ----- Original Message ----- From: "John Fawcett" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Saturday, April 17, 2004 3:22 AM Subject: [long] summary of currently unparsed url types > I'd just like to summarize the current position with regard to url types > which are not currently parsed correctly by sa and ask for some help with > tests using version 3. > > Yahoo offers a public redirection service. You can enter a url like this: > http://rds.yahoo.com/*http://www.google.com > and you get sent to www.google.com. (By the way I'm not sure what the point > of this is, because unlike > tinyurl.com the yahoo url is longer. However it sure comes in handy to > spammers who are trying > to get past sa URI rulesets.) > > Spam which is not picked up correctly by sa uri filters often contains > redirection urls, even though the redirected domain is in sc.surbl.org. Jeff > Chan has opened a bug against URIDNSBL.pm to ask for support for parsing out > the spammer domain from redirected urls. > http://bugzilla.spamassassin.org/show_bug.cgi?id=3261 > > Things are getting more complicated, because spam coming through seems to > contain features which > avoid it being picked up even by an altered parser which strips off the > http://rds.yahoo.com/* part. > > I wanted to make a summary of current understanding of the url types which > break parsing. I've tested these with SpamCopURI and ver 2.63. If someone > offers to test (from case 2 onwards) > with URIDNSBL and version 3, I'll post suitable test cases. > > 1.http://rds.yahoo.com/*http://spammer.domain.tld/aaaaaaaaaa (bug 3261) > Workaround in PerMsgStatus.pm: > $uri =~ s/^http:\/\/(?:drs|rd).yahoo.com\/[^\*]+\*(.*)$/$1/g; > > 2.http://rds.yahoo.com/*%68ttp://spammer.domain.tld/aaaaaaaa (follow-up to > bug 3261 > including test case) > (the other possible variations on this which I haven't seen as yet can use > %NN instead of > any or all the 'http' characters in the redirected domain. e.g. > http://rds.yahoo.com/*%68%74%74%70://spammer.domain.tld/aaaaaaaa > > Workaround in PerMsgStatus.pm: > $uri =~ s/\%68/h/g; > $uri =~ s/\%74/t/g; > $uri =~ s/\%70/p/g; > > 3. http://rd.yahoo.com/winery/college/banbury/*http:/len= > derserv.com?partid=3Darlenders > > The redirect url is formally incorrect (there is a single slash > after http) but browsers have no problem with this. The parser > cannot handle it. > > Workaround in PerMsgStatus.pm: > $uri =~ s/http:\/([^\/])/http:\/\/$1/g; > > By the way, this url contains 'quotable printable' characters ('= newline' > and '=3d') > which are not causing problems to the parser. Neither is the absence > of a trailing slash before the ? causing problems in parsing. > > 4. URLS without http: in front of them. The following seen in a browser > reads: > "Please copy and paste this link into your browser healthyexchange.biz " > > <p> > P<advisory>l<aboveboard>e<compose>a<geochronology>s<moral>e<palfrey> <rada= > r>c<symptomatic>o<yankee>p<conduit>y<souffle> <intake>a<arise>n<eocene>d <= > thickish>paste <impact>this <broadloom>link <road>i<dichotomous>n<quinine>= > t<scoreboard>o y<eager>o<impact>ur b<archenemy>r<band>o<wallop>wser <b> he= > althyexchange.biz</b> > > Probably not much that can be dones with this. > > 5. > http://http://www.eager-18.com/_7953f10b575a18d044cdec5a40bd4f22//?d=vision > Here the double http prevents this being parsed. (OK it wasn't in > sc.surbl.org but even > if it was it wouldn't have been picked up) > > Workaround in PerMsgStatus.pm: > $uri =~ s/http:\/\/http:\/\//http:\/\//g; > > John >
