I'm still fiddling around with these spams that have a bunch of one-letter
words hiding drugs for sale:
V k I p A m G i R u A v
V j A v L s I t U w M g
X g A f N a A f X q
C x I e A a L g I c S l
followed by a url:
http://www.prouceteir.com
followed by some presumably benign text:
physiolog
resis
comminute
Phoeb
ideologis
not called for; local anesthetics were sufficient for the cleansing and
suturing, followed by generous injections of antibiotics. The foreign
objects had passed through their bodies, explained the chief doctor.
I presume you mean bullets when you speak so reverently of foreign
objects, said Krupkin in high dudgeon.
He means bullets, confirmed Alex hoarsely in Russian. The retired
I don't think there's much to grab onto in the benign text section, however
the url tends to vary a lot and the domain name generally seems very new.
For instance, according to whois, the above domain was created on April
28th. I received the spam it contained on April 30th. The others of this
ilk I've looked at were also new domains. That suggests to me a couple
possibilities:
* look up the age of the domains via whois (preferably caching those
lookups for a reasonable period - 90 days, one year?)
* note whether or not you've seen the domain before
* lookup (and cache) other information about the domain name -
registrar, registrant, etc.
The creation date currently seems the hardest to fake, though it's expensive
to calculate and I suppose eventually the spammers will start creating their
own registrars (if they haven't already) and back-date the information they
provide.
I suppose you could start tokenizing these one-letter runs as well and see
if they contain embedded words:
C x I e A a L g I c S l ==> CIALIS
Thoughts? Anybody else seeing lots of this stuff sneak through as unsure?
Skip
_______________________________________________
spambayes-dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-dev