Adam Katz wrote:
> Bowie Bailey wrote:
>   
>> Since the sought rules have been updating for a while now, I took a
>> look at my stats to see how they were doing.  They used to be one
>> of my most useful rules, but recently, they don't seem to be doing
>> so good.
>>
>> Here are the stats for the last month:
>>     
>
> That looks like the sare stats script (modified to show all rules as
> evidenced by rank 261).  It doesn't account for FPs or FNs.  I
> reformatted your output so it wraps well for email.
>   

Exactly.  I told it to show me the top 400 so I could see the stats for
all the sought rules.

Thanks for the reformat.  Call me lazy...  :)

>> TOP SPAM RULES FIRED
>> ------------------------------------------------------------
>> RANK    RULE NAME           COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>> ------------------------------------------------------------
>>  111    JM_SOUGHT_FRAUD_3     112    0.06    0.36   0.97    0.01
>>  154    JM_SOUGHT_2            53    0.03    0.17   0.46    0.16
>>  214    JM_SOUGHT_3            31    0.02    0.10   0.27    0.51
>>  253    JM_SOUGHT_1            21    0.01    0.07   0.18    0.01
>>  261    JM_SOUGHT_FRAUD_2      19    0.01    0.06   0.17    0.01
>> ------------------------------------------------------------
>>
>> TOP HAM RULES FIRED
>> ------------------------------------------------------------
>> RANK    RULE NAME           COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>> ------------------------------------------------------------
>>   85    JM_SOUGHT_3           99     0.08    0.32   0.27    0.51
>>  161    JM_SOUGHT_2           30     0.03    0.10   0.46    0.16
>>  351    JM_SOUGHT_FRAUD_3      2     0.00    0.01   0.97    0.01
>>  365    JM_SOUGHT_FRAUD_2      2     0.00    0.01   0.17    0.01
>>  378    JM_SOUGHT_1            1     0.00    0.00   0.18    0.01
>> ------------------------------------------------------------
>>     
>
> That is quite different from our masscheck stats.  Today's results at
> http://ruleqa.spamassassin.org/20100201/%2FJM_SOUGHT look like this:
>
>    SPAM%     HAM%     S/O    RANK   SCORE  NAME
>   9.8564   0.0042   1.000    0.94    0.01  T_JM_SOUGHT_3
>   8.1587   0.0068   0.999    0.93    0.01  T_JM_SOUGHT_2
>  11.6464   0.0289   0.998    0.89    0.01  T_JM_SOUGHT_1
>        0        0   0.500    0.48    0.00  JM_SOUGHT_FRAUD_1
>        0        0   0.500    0.48    0.00  JM_SOUGHT_FRAUD_2
>        0        0   0.500    0.48    0.00  JM_SOUGHT_FRAUD_3
>
>
> Here are my own numbers, as observed by a custom script which
> recalculates results based on re-scoring specific rules.  "Rejected"
> requires a score of 8.0 and "flagged" requires 5.0.  (It only examines
> three rules at a time, and we got 33 messages between my runs.)
>
> JM_SOUGHT_1 ( 0.3% of 34831 total) with score-bump of -4:
>   124 rejected
>   1 flagged, with 0 (0%) that would have been rejected
>   1 passed, with -1 (-0.0%) that would have been flagged
> JM_SOUGHT_2 ( 0.2% of 34831 total) with score-bump of -4:
>   47 rejected
>   8 flagged, with -2 (-0.1%) that would have been rejected
>   24 passed, with -8 (-0.0%) that would have been flagged
> JM_SOUGHT_3 ( 0.5% of 34831 total) with score-bump of -4:
>   121 rejected
>   10 flagged, with -3 (-0.1%) that would have been rejected
>   60 passed, with -10 (-0.0%) that would have been flagged
> JM_SOUGHT_FRAUD_1 ( 0.0% of 34864 total) with score-bump of -3:
>   34 rejected
>   0 flagged, with 0 (0%) that would have been rejected
>   0 passed, with 0 (0%) that would have been flagged
> JM_SOUGHT_FRAUD_2 ( 0.5% of 34864 total) with score-bump of -3:
>   203 rejected
>   0 flagged, with 0 (0%) that would have been rejected
>   0 passed, with 0 (0%) that would have been flagged
> JM_SOUGHT_FRAUD_3 ( 1.3% of 34864 total) with score-bump of -3:
>   486 rejected
>   0 flagged, with -4 (-0.2%) that would have been rejected
>   1 passed, with 0 (0%) that would have been flagged
>
> My script was mostly written for adding points rather than subtracting
> them, so the notation is a little less intuitive.  For example, rule 2
> moved two mails from flag to reject and caused eight mails to get flagged.
>
> Recall that unlike the masscheck (which is hand-verified), log parsers
> like the sare script and my own script have no knowledge of FPs or
> FNs.  I bet most if not all of the 86 messages that the SOUGHT rules
> noticed but didn't push up to the 5.0 mark were probably FNs.
>
> Of course, the reason I have a flag threshold and a reject threshold
> is so that I can still deliver low-scoring FPs.  My users get them
> flagged as spam, with SA's spaminess score in the subject.  That means
> instead of risking a loss of 86 messages, I only risked losing 9, and
> thanks to the smtp-time reject nature of my implementation, the
> senders got notices of the deliver failure.  I have not yet had a
> complaint of these rejections based on SOUGHT rules.  (The complaints
> are rare enough and usually related to massive misconfigurations on
> the sending relay.)
>   

I understand the problem with the stats program and FP/FN, but the last
time I looked at the stats for sought (which was admittedly quite a
while ago), a couple of the rules were showing in my top 20 spam rules. 
Now I have to go all the way down to 111 to find the first one.

-- 
Bowie

Reply via email to