On Saturday, April 3, 2004, 12:34:24 AM, Daniel Quinlan wrote: > Jeff Chan <[EMAIL PROTECTED]> writes:
>> Can you cite some examples of FP-prevention strategies? > 1. Automated testing. We're testing URLs (web sites). That allows a > large number of strategies which could be used from each aspect of > the URL. > A record > check other blacklists > check IP owner against SBL > domain name > check name servers in other blacklists > check registrar > check age of domain (SenderBase information) > check ISP / IP block owner (SenderBase, SBL, etc.) > web content > check web site for common spam web site content (porn, drugs, credit > card forms, empty top-level page, etc.) > Any of those can also be used in concert with threshold tuning. For > example, lower thresholds if a good blacklist hits and somewhat > higher thresholds for older domains. I agree with the content check, but will step on many toes here by proclaiming that other blacklists (other than SBL), name servers, registrars, ISP address blocks, and similar approaches are overly broad and have too much potential for collateral damage *for my sensibilities*. I really, really hate blacklisting innocent victims. I consider that a false accusation or even false punishment. Having policies which allow blacklisting an entire ISP or even an entire web server IP address have the potential to harm too many innocent bystanders, IMO. Your mileage may and probably does vary. ;) Our approach is to start with some likely good data in the SpamCop URIs. See comments below. > 2. Building up a long and accurate whitelist of good URLs over time > would also help. Maybe work with places that vouch for domain's > anti-spam policies (Habeas, BondedSender, IADB) to develop longer > whitelists. I agree in principle, however I feel that the SpamCop reported URIs tend to have relatively few FPs. They are domains that people took the time to report; in essence they are *voting with their time that these are spam domains*. That's one of the reasons our whitelist is quite small now (@ 35) yet catches the few legitimate domains and subdomains that survive the reporting and thresholding and are (mis-over-)reported enough to get onto the list before I can notice and whitelist them. That need has been small so far. http://spamcheck.freeapp.net/whitelist-domains > 3. Using a corpus to tune results and thresholds (also whitelist > seeding). Agreed. Currently we lack spam and ham corporea of our own and have not had a chance to set some up yet. That may come later though. I hope I'm not taking too confrontational a tone here. I'm just trying to defend our approach, which I think can be valid. I also realize people have a lot of work invested in other approaches, but I hope they will eventually give ours a try. I feel it has value, even if I can't prove it conclusively myself yet. LOL! :-) Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
