SM wrote:
That was the first thought through my mind when I read the original post. No need for a full-blown fingerprint... just see if they look "server-ish" or not. Try connecting to 25... and then maybe telnet, ssh, http, and imap.At 08:54 01-12-2004, John Hardin wrote:
However, this sounds like it might be useful in Spamassassin: attempt to contact the sender on port 25, and add a little to the spamminess score if the connection is refused or times out.
There'd be some overhead involved in this, initially, but this could be mitigated by keeping a cache of previous call-backs. I imagine this would act like a sieve, where the hosts who send you the most mail (and, hence, would cause the greatest call-back load) would appear in the cache the soonest, and that would cut down on the call-back load the most. After a week or so, I imagine that the call-back load would be tapering off to those few odd hosts which connect.
There are some well-known domains that have SMTP outgoing-only servers.
Good thing they're well-known. We can add them to a file of known outgoing-only servers and can further cut down on the call-back load.
Overall, I think that there's a good chance that this approach is going to prove unworkable (either there will be too much overhead in calling back hosts, or the fingerprinting won't prove to be a very good litmus test for spam zombie, or whatever). However, I *do* think that there's enough of a glimmer of hope in this that it is worthwhile for someone to, at the least, start compiling some preliminary data.
The data I'm talking about would be something like... take a bunch of various spam and ham messages and try connecting to a few choice ports to the remote host who delivered it. Then, dump that data into a spreadsheet or SPSS or SAS and see which call-back ports, if any, show the highest correlation to spam/ham-iness. If it turns out that there's little correlation at all, then there's no point in even trying to solve the problem of the load it would put on the server to do it in realtime.
Of course, doing this kind of analysis on old messages can yield bad data, since the remote hosts (especially the spam zombies) might not be up all the time and might not have permanent IP's. Best thing would be to compile the data from messages as they're coming in.
- Joe
smime.p7s
Description: S/MIME Cryptographic Signature