[Skip] > I'm still fiddling around with these spams that have a bunch of one- > letter > words hiding drugs for sale: > > V k I p A m G i R u A v > V j A v L s I t U w M g > X g A f N a A f X q > C x I e A a L g I c S l
I will try your sf patch with newer mail soon, honest! :) [...] > I don't think there's much to grab onto in the benign text section, > however > the url tends to vary a lot and the domain name generally seems > very new. > For instance, according to whois, the above domain was created on > April > 28th. I received the spam it contained on April 30th. The others > of this > ilk I've looked at were also new domains. That suggests to me a > couple > possibilities: > > * look up the age of the domains via whois (preferably caching > those > lookups for a reasonable period - 90 days, one year?) > > * note whether or not you've seen the domain before > > * lookup (and cache) other information about the domain name - > registrar, registrant, etc. > > The creation date currently seems the hardest to fake, though it's > expensive > to calculate and I suppose eventually the spammers will start > creating their > own registrars (if they haven't already) and back-date the > information they > provide. One of the things on my to-do list is to store information like this in the ham & spam I archive so that these sorts of things can be tested with the 'traditional' tools. I have a script that does a bunch of DNS-based information gathering (SURBL lookups, DomainKey, SenderID, DNS blacklists - not the things you list above, but that wouldn't be that hard to add), and just need to figure out how to get fetchmail working properly (on OS X) so that the mail is retrieved and piped through it. If you create a patch for any of the above, I'd be happy to use it day-to-day and let you know what appears in the token database. > I suppose you could start tokenizing these one-letter runs as well > and see > if they contain embedded words: > > C x I e A a L g I c S l ==> CIALIS This seems a little too specific for me - there are lots of other ways to hide the rubbish letters apart from putting them in lower case. > Thoughts? Anybody else seeing lots of this stuff sneak through as > unsure? I see a few, although I have more problems with image spam (no successful patches there yet). =Tony.Meyer _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev