Warren Togami wrote:
http://ruleqa.spamassassin.org/20091006-r822170-n/T_CN_URL/detail
A very sizeable amount of spam (currently 50%) contains .cn domains that
were registered very recently. They keep registering new domains in
order to keep ahead of the URIBL's.
I have an account here that gets a lot of spam. There have been 263
unique .cn domain names contained within urls in spam message bodies of
that account today. All but 94 of them were listed in uribl or surbl.
If I do http requests on http://thedomain/ for each of those domains,
every single one of the pages returned for all of those domains matches
one of the following two regexes:
<link [^>]*href="/themes/express/img/pharmacyexpress\.ico" [^>]*>
<title>Prestige Replicas : Luxury at affordable prices!</title>
I wrote a module a while ago when the groups.yahoo.com spam was
happening which pulled down those pages and found that every single one
of them contained html like this:
<font color="red" size="6"><b>CLICK HERE TO ENTER!</b></font></a>
I've updated it to do http requests on the .cn domains now too. It uses
memcache to avoid repeated requests for the same websites.
This is usually the point where someone asks for the source code, even
though it's not fully ready for other people to use, so I've temporarily
stuck it up at https://secure.grepular.com/WebsiteScanner/ in case
anyone wants to pick it a part and use bits of it.
--
Mike Cardwell - IT Consultant and LAMP developer
Cardwell IT Ltd. (UK Reg'd Company #06920226) http://cardwellit.com/