I'm still fiddling around with these spams that have a bunch of one-letter words hiding drugs for sale:
V k I p A m G i R u A v V j A v L s I t U w M g X g A f N a A f X q C x I e A a L g I c S l followed by a url: http://www.prouceteir.com followed by some presumably benign text: physiolog resis comminute Phoeb ideologis not called for; local anesthetics were sufficient for the cleansing and suturing, followed by generous injections of antibiotics. The foreign objects had passed through their bodies, explained the chief doctor. I presume you mean bullets when you speak so reverently of foreign objects, said Krupkin in high dudgeon. He means bullets, confirmed Alex hoarsely in Russian. The retired I don't think there's much to grab onto in the benign text section, however the url tends to vary a lot and the domain name generally seems very new. For instance, according to whois, the above domain was created on April 28th. I received the spam it contained on April 30th. The others of this ilk I've looked at were also new domains. That suggests to me a couple possibilities: * look up the age of the domains via whois (preferably caching those lookups for a reasonable period - 90 days, one year?) * note whether or not you've seen the domain before * lookup (and cache) other information about the domain name - registrar, registrant, etc. The creation date currently seems the hardest to fake, though it's expensive to calculate and I suppose eventually the spammers will start creating their own registrars (if they haven't already) and back-date the information they provide. I suppose you could start tokenizing these one-letter runs as well and see if they contain embedded words: C x I e A a L g I c S l ==> CIALIS Thoughts? Anybody else seeing lots of this stuff sneak through as unsure? Skip _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev