Aaargh. I replied directly to Matt, not to the list... take two! (reply inline... and I gotta set up procmail to flip those headers)
Matt Kettler said: > At 01:19 PM 3/24/04 +0100, Jesse Houwing wrote: >>ftp://ftp.oblivion-software.de/pub/oblivion-software.de/SA/drugads.cf >>It is a anti drug rule generated by Norbert Schlia >>([EMAIL PROTECTED]) >>I'm running some tests here and it seems very promising. > > Intersting.. I wonder why Norbert chose to duplicate all his rules as both > body and header-subject rules... body rules match subject lines too, so it > seems an extravagant waste to me. I guess the scoring differences are I can speak to this: Norbert used my CMOScript generation script to create these rules. When I first wrote CMOScript, I didn't realize the body rule/subject relationship. > handy, but that's a lot of extra regex just for a subject vs body score > split. Correct. I wanted to have a small bonus to obfu words found in the subject. I didn't realize I was actually almost tripling the score by scoring body/subject in this manner :) I've simply never gone back and revised the default scoring... I've been running my own set of obfu words for months now; since I haven't had any nasty FPs based on these rules, I haven't had any motivation to make the change. > > I'd also love to see a comparison against antidrug... antidrug is a lot > more conservative in it's default scores, but it's a lot of the same > concepts. drugads seems to cover more kinds of drugs than I do however.. He's got a nice list of drugs there. While I haven't run a mass-check on his wordlist, here's a hit-freqs I did (on antidrug and the wordlist I use) a week or two back: http://sandgnat.com/cmos/temp/hit-freqs.2004-03-18.txt . There's also some edit distance eval results in there (near matching), just ignore those rules (SPLEL). FWIW, my corpus has little to no legitimate drug coorespondence. In general, it seems antidrug is highly tuned where CMOScript is simply brute-force. I use both. -- Chris Thielen Easily generate SpamAssassin rules to catch obfuscated spam phrases(0BFU$C/\TED SPA/\/\ P|-|RA$ES): http://www.sandgnat.com/cmos/ Keep up to date with the latest third party SpamAssassin Rulesets: http://www.exit0.us/index.php/RulesDuJour
