My plan is to create another free reputation service, like a combination of a whitelist and a blacklist, except providing the actual data instead of just yes/no/maybe. To help SpamAssassin filtering, obviously.
The data I'm planning to provide is, for every IP address, the percentage of email from it which was ham (normalized like the S/O value in SpamAssassin ruleqa), and total count of recent emails from that IP (a logarithm of it). Output data based on my own email: http://www.chaosreigns.com/iprep/iprep.txt With my 2618 hams, and 2956 spams, there were only *two* IP addresses that were not 100% spam or 100% ham (both belong to google). This kind of thing is why black lists and white lists are useful for predicting if an email is spam or ham. The highest ranked test in SpamAssassin is RCVD_IN_XBL, a spamhaus.org blacklist. #7 is RCVD_IN_PSBL, and #11 is RCVD_IN_DNSWL_HI, which is also the highest ranking "nice" rule. To do this, I need data from you. Create a folder containing only email you've confirmed is ham, and another containing what you've confirmed is spam. http://www.chaosreigns.com/iprep/dl/iprep.pl ./iprep.pl ham:dir:~/masscheckwork/ham spam:dir:~/masscheckwork/spam/ The arguments are the same as the "targets" used by SpamAssassin's mass-check (using its perl modules): <class>:<format>:<location> <class> is "spam" or "ham" <format> is "dir", "file", "mbx", "mbox", or "detect" <location> is a file or directory name. globbing of ~ and * is supported You can specify many targets at once. Please run it as a daily cron job. The required ~/.ipreprc config file: $trusted_networks = '<space delimited list of trusted hosts>'; $user = 'username'; $pass = 'password'; $trusted_networks is very important, and needs to contain everything from both your trusted_networks and internal_networks values from SpamAssassin, which are documented here: http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.html#network_test_options http://wiki.apache.org/spamassassin/TrustPath This is to prevent reporting the IP of your trusted relays instead of the actual IP sending the email. Email me to get an account to upload the data. Please email me from a non-freemail account, one not listed in http://svn.apache.org/repos/asf/spamassassin/trunk/rules/20_freemail_domains.cf Major examples of freemail accounts, which I don't want you to email me from, are: gmail.com, yahoo.com, and hotmail.com. This is just to make it slightly harder for spammers to send me bad data. And if you're on this list, I know you have a non-freemail account. I won't tell anybody your email address, and I consider the uploaded data confidential. I'm thinking about providing the data only via rsync, instead of via DNS, because I think that should reduce network load. I'd create a plugin that would grab the data directly. Just as a disclosure, I have been involved with dnswl.org since November 2006. I have no plan to use any of their data, other than to look for problems in my data. -- "Let's just say that if complete and utter chaos was lightning, then he'd be the sort to stand on a hilltop in a thunderstorm wearing wet copper armour and shouting 'All gods are bastards'." - The Color of Magic http://www.ChaosReigns.com