Aaargh.  I replied directly to Matt, not to the list... take two! (reply
inline... and I gotta set up procmail to flip those headers)

Matt Kettler said:
> At 01:19 PM 3/24/04 +0100, Jesse Houwing wrote:
>>ftp://ftp.oblivion-software.de/pub/oblivion-software.de/SA/drugads.cf
>>It is a anti drug rule generated by Norbert Schlia
>>([EMAIL PROTECTED])
>>I'm running some tests here and it seems very promising.
>
> Intersting.. I wonder why Norbert chose to duplicate all his rules as both
> body and header-subject rules... body rules match subject lines too, so it
> seems an extravagant waste to me. I guess the scoring differences are

I can speak to this:  Norbert used my CMOScript generation script to
create these rules.  When I first wrote CMOScript, I didn't realize the
body rule/subject relationship.

> handy, but that's a lot of extra regex just for a subject vs body score
> split.

Correct.  I wanted to have a small bonus to obfu words found in the
subject.  I didn't realize I was actually almost tripling the score by
scoring body/subject in this manner :)  I've simply never gone back and
revised the default scoring... I've been running my own set of obfu words
for months now; since I haven't had any nasty FPs based on these rules, I
haven't had any motivation to make the change.

>
> I'd also love to see a comparison against antidrug... antidrug is a lot
> more conservative in it's default scores, but it's a lot of the same
> concepts. drugads seems to cover more kinds of drugs than I do however..

He's got a nice list of drugs there.  While I haven't run a mass-check on
his wordlist, here's a hit-freqs I did (on antidrug and the wordlist I
use) a week or two back:
http://sandgnat.com/cmos/temp/hit-freqs.2004-03-18.txt .  There's also
some edit distance eval results in there (near matching), just ignore
those rules (SPLEL).  FWIW, my corpus has little to no legitimate drug
coorespondence.


In general, it seems antidrug is highly tuned where CMOScript is simply
brute-force.  I use both.

--
Chris Thielen

Easily generate SpamAssassin rules to catch obfuscated spam
phrases(0BFU$C/\TED SPA/\/\ P|-|RA$ES):  http://www.sandgnat.com/cmos/
Keep up to date with the latest third party SpamAssassin Rulesets:
http://www.exit0.us/index.php/RulesDuJour


Reply via email to