Hello Daniel, Monday, May 3, 2004, 7:30:15 PM, you wrote:
>> Yes, new rules used to make their way into CVS quickly, but those >> rules (at least so far) take months to get into the field, because of >> the overhead and other challenges associated with the GA run. SARE >> provides a method whereby rules can be tested and then adopted by >> systems very quickly. DQ> I think you're overestimating how much users want to download DQ> unofficial rule sets. ... Actually, I think I've been /underestimating/ how many systems actually download our unofficial rule sets. SARE started as a small number of people who were developing rules for their own use, who where sharing them through private web pages and/or the exit0.us wiki, just like William Stearns shares the blacklist compilation he manages. After a short while, we decided to share them with the community in a slightly more formal manner, but "more formal" doesn't yet actually reach "formal." Even with the exit0.us wiki, I expected there to be just a dozen or two people/systems outside SARE itself using these fly-by-the-seat-of-the-pants rules. Instead, as you explore below, we have so many people actually using the new rule sets that our informality has been causing problems for some of them. DQ> I've seen some user complaints that this now seems to be DQ> required (at least by some people answering questions on the mailing DQ> list) and that there's too much confusion about which sets to use, DQ> questions about FP rates, etc. Making users do extra work is uncool. DQ> Making it a prerequisite for running SpamAssassin is even worse, but I'm DQ> concerned that's exactly where we could be headed. I don't see that as our destination (intended or not), but I can understand why some people feel the SARE sets are becoming required. SpamAssassin 2.4x through 2.6x with network checks and a decently trained Bayes database caught 80% to 95% of all spam when they first came out. That was and is quite satisfactory -- it means that only a few spam managed to reach user inboxes unflagged. A major problem, however, is that spammers adapt to SA faster than SA can adapt to spammers, because of the long cycle between releases. Because of this situation, an unmodified vanilla installation of SA loses accuracy over time. I think Chris Santerre was the first one to shove a production SA above the 99% accuracy mark, and he did that with 2.4x while most of us were working with 2.5x. I was able to push SA 2.5x as high as 95% by adjusting scores in the distribution rule set, and by adopting William Stearns' blacklists, but didn't hit 99% until I started developing some very powerful rules of my own, many of which are domain specific, many of which have been submitted to SA and/or SARE, and many of which are not domain specific but also not viable SARE candidates. I've been using SA on three domains for a year now. In that year, the amount of spam which reaches my domains has doubled. Even at an accuracy rate of 99.8%, I get almost a dozen FNs each week, and mine are very low volume systems. If 80% to 90% spam reduction is sufficient, then as it stands 2.6x with network and Bayes tests is sufficient (especially with the SURBL enhancement). If a system needs 95% or better spam reduction, they currently need SARE's help. Agreed, there's too much confusion about which rule sets to use. There's confusion about which rules are "safer" or more conservative, and which rules are "riskier" or more aggressive. We're making some progress with that, but that progress is admittedly experimental. DQ> Why are we here? I'm not sure. The benchmark to get SVN access is not DQ> really all that high (ask Michael and Sidney), but perhaps SARE is so DQ> easy to get rules into that it *seems* high. ... Actually, it's not getting SVN access that concerns me, it's integrating that SVN service with production email systems that must necessarily be on stable and officially released versions of SA. To keep my systems at 99.8% I need to add new rules regularly, and I need to test/verify those rules against the same production version as my production systems. And I agree that it's not difficult to get rules into SVN, through Bugzilla if nothing else. It's not the alleged difficulty of SVN that is significant, but rather the delay between that submission and rule verification and the final release of the next SA. My "longwords" rules were submitted to SVN over a month ago. I'm sure they've been improved on, and will be a great help to everyone when released with 3.0. However, for that same month, they've been a) unavailable for production use via SA, and b) flagging thousands of spam here, and maybe dozens or hundreds of thousands of spam through SARE distribution. Monday, May 3, 2004, 7:24:51 PM, Justin stated, JM> I think that the SARE ruleset is probably the best "first deployment" JM> area for rules -- but I also think that getting some of those rules JM> into the SpamAssassin distro would be nice ;) and I believe SARE (or equivalent) will be the path of "first deployment" as long as the SVN path delays formal distribution. If under 3.0 there'll be a method where the SVN path leads to reasonably speedy distribution for qualified rules as you suggest (and I'm hoping you'll be able to even beat SARE's speed of distribution), then SARE will fall further into the background. DQ> ... The reality is that submitting rules into SVN is much easier than DQ> maintaining my own set could ever be. Testing happens automatically, DQ> I get peer review and fixes, I don't have to worry about scoring (and DQ> I hope to automate scoring of new rules for automatic updates after DQ> 3.0 is out), etc. If you can use SVN for your email systems, that works great. I can't afford to use SVN in production, and I can't afford to wait months between releases. If the release cycle speeds up, or distribution rules can be released more frequently under 3.0, then yes, that will be a big benefit, and will lessen the dependence so many systems have developed for SARE. >> A/The major benefit of the GA run is that the rules get properly and >> reliably scored across the comprehensive corpi. Outside of the dev >> cycle, that can't yet happen. However, with SARE running its >> mass-checks against multiple corpi, we're able to generate reasonable >> rule scores which aren't as good as the GA scores, but are good enough >> for most systems. DQ> Score optimization works better when you score everything at once, not DQ> just new rules. Scoring only new rules means your FP rate is going to DQ> be significantly higher, especially if you have many of them. DQ> If someone tells me which SARE rules I should use, I could prove this. DQ> :-) Very definitely. I agree with all of this. If it's easy for you to test SARE rulesets through the SVN process, then I'd be VERY interested in such tests against these two ruleset files: http://www.rulesemporium.com/rules/70_sare_genlsubj0.cf http://www.rulesemporium.com/rules/70_sare_genlsubj2.cf They hit NO ham during any SARE testing, against several corpi. If they hit any ham during SVN testing, then I definitely want to revise their scores accordingly. Note that some of the rules in http://www.rulesemporium.com/rules/70_sare_genlsubj0.cf may be suitable for inclusion in the distribution rule set. 35 of them hit 0.1% or more of all spam each, and no ham, according to our tests. But the rest drop off quickly to those that hit a few dozen spam, down to just one or two spam. Those are obviously NOT appropriate for the distribution set, but are rules that the more aggressive anti-spam systems find useful. >>>> - there's less QA and only manual scoring of SARE rules >> Agree with the first, and quibble with the second (we actually >> generate most scores automatically now, based on the results of our >> mass-check runs; they aren't as high quality as the scores provided by >> the GA, but they aren't "manual" nor "arbitrary"). DQ> Most of the scores seem much higher than they should be. Tell me which DQ> SARE rules I should be using and I can do a corpus test to prove it. :-) I agree. Maybe not with the sets above, but you should be able to demonstrate this with http://www.rulesemporium.com/rules/70_sare_genlsubj3.cf (if you add this ruleset as scored to the full SVN rule set and don't get some FP somewhere, I'd be surprised, even though I think the scores calculated for that rule set are conservative). The point for (everyone) to remember (and we probably don't say this enough on the lists, website, and rule set documentation): SARE is composed of those people who are aggressive anti-spam fighters. Scores that are conservative to us may be overly aggressive scores for others. We also suffer from the problems you mentioned about "local" testing -- even though we use multiple corpi, they're still *our* corpi. Just because I have not seen any FP in over a month does not mean that another site will avoid FPs using the same rules and same scores. (For that matter, I run SA with a required hits of 9. This allows me to be simultaneously more conservative and more aggressive than many sites. To maintain SA's power I've had to increase the scores of many distribution rules, but /only/ some of them. And I've had to lower scores of only a few distribution rules, and by less than I'd have needed at 5.0. Finally, with an R-H=9 system, I have a lot more room for error without creating FPs. Even though my SARE contributions are scored down for a 5.0 standard, applying those rules within a 5.0 system gives less room for error, and increases the chance that multiple rule matches in lengthy ham will generate FPs. A 5.0 system has only 5/9's as much margin for error as I have.) >> Actually, part of the reason SARE is growing and strengthening is >> because during the development of version 3.0 the core developers >> needed to concentrate on code changes and not so much on rules. There >> was even a comment to that effect on one or both lists a few months >> ago. DQ> Rule development has never really stopped (if you look at SVN, we have a DQ> lot of new rules that never saw Bugzilla or SARE), even if some of the DQ> developers have been focusing on code. I've been mostly working on new DQ> rules for most of this year. I'm very glad to hear that, and look forward to benefiting from your work (and others'). >> If I understand what you're saying here, this would improve the >> quality of the rules, and would also slow down the release of rule >> updates. DQ> Um, I think you're misunderestanding me. Daily automated updates is DQ> definitely not slower. Sure, maybe only an average of an additional DQ> rule or two per day might be pushed out, but it adds up. This makes it sound like under 3.0 there will be the ability to provide daily, automated updates of SVN-validated and SVN-scored rules. If so, that will be FANTASTIC! It doesn't solve the problem I have of needing to stay in sync with my production systems -- when work begins on SA 3.1, the daily automated updates provided to the world will need to be limited to those that work on 3.0, or I'll not be able to use them -- is that planned to be part of the system? >> What we really need is someone who can work through the current SVN >> rules, compare them to our better SARE rules, and submit those that >> are worth while but not yet in the SVN queue. Again, I don't have the >> time for this. Hopefully someone else will. DQ> I'm looking more for people to work directly on SVN. If it's someone DQ> just adding stuff to the bugzilla queue, it's just as efficient for one DQ> of the existing developers to poke at SARE on their own (which is how DQ> backhair ended up in SVN) and this is why we're looking for more help. I understand, and I hope you'll get people to work on this. Bob Menschel
