Re: 72_scores.cf compared to the one from march 15

2017-11-16 Thread Merijn van den Kroonenberg

On 2017-11-16 10:06, Merijn van den Kroonenberg wrote:

On 11/15/2017 07:10 AM, Dave Jones wrote:

I got my SVN authentication issue figured out on my laptop and 
committed

these.  Fingers crossed for the run in about 5 hours.


I have been comparing last night's 72_scores.cf against the one from 
march
and it looks *really* good now. That last commit pushed up the amount 
of

lines right up to the amount as we had in march.

I also ran the compare-rulefiles script just like yesterday.

./compare-rulefiles -d 72_scores_20170315.cf 72_scores-1815405.cf >
deleted_rules.txt
./compare-rulefiles -r 0 -d deleted_rules.txt active-1815421.list >
deactivated_rules.txt


small mistake, I used a too-new active.list

./compare-rulefiles -r 0 -d deleted_rules.txt active-1815296.list > 
deactivated_rules.txt



./compare-rulefiles -r 0 -a deactivated_rules.txt deleted_rules.txt >
disappeared_rules.txt

cat disappeared_rules.txt
ADVANCE_FEE_4_NEW
ADVANCE_FEE_5_NEW
CN_B2B_SPAMMER
URI_GOOGLE_PROXY


cat disappeared_rules.txt
ADVANCE_FEE_4_NEW
CN_B2B_SPAMMER
URI_GOOGLE_PROXY

So even less with correct active.list



So thats only 4 rules which are not in our new scores file but which 
were

in the march one (discounting deactivated rules).

When looking at the changes between now and then, I see nothing
suspicious. i am now pretty confident the score generation is running 
as

before in march.

Anything which is not right, probably wasn't right in march either ;)

I would say, lets get people testing!

Here are the full changes between now and march so you can see for 
yourself:


./compare-rulefiles 72_scores_20170315.cf 72_scores-1815405.cf
Only in 1 (removed in 2)
ADVANCE_FEE_4_NEW
ADVANCE_FEE_5_NEW
AXB_XMAILER_MIMEOLE_OL_1ECD5
AXB_XM_FORGED_OL2600
BODY_EMPTY
CN_B2B_SPAMMER
FREEMAIL_DOC_PDF_BCC
FSL_HELO_BARE_IP_2
HDRS_LCASE
HK_SCAM_N15
LOTTO_AGENT
LOTTO_DEPT
MONEY_LOTTERY
MSGID_NOFQDN1
RP_MATCHES_RCVD
SHARE_50_50
TO_NO_BRKTS_FROM_MSSP
URI_GOOGLE_PROXY

Only in 2 (added in 2)
ADVANCE_FEE_4_NEW_MONEY
ADVANCE_FEE_5_NEW_FRM_MNY
ADVANCE_FEE_5_NEW_MONEY
APOSTROPHE_TOCC
AXB_X_AOL_SEZ_S
DEAR_BENEFICIARY
FROM_MISSP_DYNIP
FSL_HELO_FAKE
FSL_MIME_NO_TEXT
FUZZY_UNSUBSCRIBE
HDRS_MISSP
MANY_PILL_PRICE
MILLION_USD
MONEY_ATM_CARD
MONEY_FORM
MONEY_FORM_SHORT
MONEY_FROM_41
MONEY_FROM_MISSP
SERGIO_SUBJECT_VIAGRA01
SHORTENED_URL_SRC
SINGLETS_LOW_CONTRAST
SPOOFED_FREEM_REPTO_RUS
TO_NO_BRKTS_DYNIP

Changed
AC_HTML_NONSENSE_TAGS
  1.000 0.001 1.000 0.001
  1.000 1.000 1.000 1.000
ADVANCE_FEE_2_NEW_MONEY
  1.997 0.001 1.997 0.001
  0.001 0.020 0.001 0.020
ADVANCE_FEE_3_NEW
  3.496 0.001 3.496 0.001
  3.001 3.467 3.001 3.467
ADVANCE_FEE_3_NEW_MONEY
  2.796 0.001 2.796 0.001
  3.099 2.696 3.099 2.696
AXB_XMAILER_MIMEOLE_OL_024C2
  0.367 0.001 0.367 0.001
  1.816 0.006 1.816 0.006
BODY_URI_ONLY
  0.998 0.001 0.998 0.001
  1.000 0.999 1.000 0.999
BOGUS_MSM_HDRS
  0.909 0.001 0.909 0.001
  0.795 1.377 0.795 1.377
CANT_SEE_AD
  2.996 0.500 2.996 0.500
  1.000 1.000 1.000 1.000
CK_HELO_DYNAMIC_SPLIT_IP
  1.350 0.001 1.350 0.001
  1.500 0.107 1.500 0.107
CK_HELO_GENERIC
  0.249 0.249 0.249 0.249
  0.250 0.248 0.250 0.248
COMMENT_GIBBERISH
  1.498 1.499 1.498 1.499
  1.000 1.000 1.000 1.000
DATE_IN_FUTURE_96_Q
  3.296 3.299 3.296 3.299
  2.899 2.696 2.899 2.696
FBI_MONEY
  0.696 0.001 0.696 0.001
  1.000 1.000 1.000 1.000
FBI_SPOOF
  1.999 1.999 1.999 1.999
  1.000 1.000 1.000 1.000
FILL_THIS_FORM
  2.748 0.001 2.748 0.001
  0.113 1.488 0.113 1.488
FORM_FRAUD
  0.998 0.001 0.998 0.001
  1.000 0.998 1.000 0.998
FORM_FRAUD_3
  2.696 0.001 2.696 0.001
  2.899 0.999 2.899 0.999
FORM_FRAUD_5
  0.209 0.001 0.209 0.001
  3.499 1.594 3.499 1.594
FOUND_YOU
  3.013 0.001 3.013 0.001
  1.000 1.000 1.000 1.000
FREEMAIL_FORGED_FROMDOMAIN
  0.001 0.199 0.001 0.199
  0.001 0.001 0.001 0.001
FROM_IN_TO_AND_SUBJ
  0.287 0.262 0.287 0.262
  0.001 0.001 0.001 0.001
FROM_MISSP_FREEMAIL
  3.595 0.001 3.595 0.001
  2.213 1.781 2.213 1.781
FROM_MISSP_MSFT
  0.001 0.001 0.001 0.001
  1.097 1.596 1.097 1.596
FROM_MISSP_REPLYTO
  0.001 0.001 0.001 0.001
  2.443 0.001 2.443 0.001
FROM_MISSP_SPF_FAIL
  0.001 1.000 0.001 1.000
  0.001 0.001 0.001 0.001
FROM_MISSP_TO_UNDISC
  1.438 0.001 1.438 0.001
  1.472 0.448 1.472 0.448
FROM_MISSP_USER
  0.001 0.001 0.001 0.001
  3.316 1.188 3.316 1.188
FROM_MISSP_XPRIO
  0.001 0.001 0.001 0.001
  1.785 2.497 1.785 2.497
FROM_WORDY
  2.497 0.001 2.497 0.001
  2.500 2.498 2.500 2.498
FSL_CTYPE_WIN1251
  0.001 0.001 0.001 0.001
  3.515 3.080 3.515 3.080
FSL_NEW_HELO_USER
  0.083 0.001 0.083 0.001
  1.719 0.750 1.719 0.750
HELO_MISC_IP
  0.248 0.250 0.248 0.250
  0.250 0.249 0.250 0.249
HK_RANDOM_FROM
  0.998 0.001 0.998 0.001
  0.999 0.999 0.999 0.999
HK_SCAM_N2
  3.249 0.001 3.249 0.001
  1.498 2.696 1.498 2.696
IMG_DIRECT_TO_MX
  2.397 2.400 2.397 2.400
  3.599 1.744 3.599 1.744
LIST_PRTL_SAME_USER
  0.001 0.286 0.001 0.286
  1.000 1.000 1.000 1.000
LONG_HEX_URI
  2.194 2.290 2.194 2.290
  1.102 0.853 1.102 0.853
LONG_IMG_URI
  0.553 0.100 

Re: 72_scores.cf compared to the one from march 15

2017-11-16 Thread Merijn van den Kroonenberg
> On 11/15/2017 07:10 AM, Dave Jones wrote:
>
> I got my SVN authentication issue figured out on my laptop and committed
> these.  Fingers crossed for the run in about 5 hours.

I have been comparing last night's 72_scores.cf against the one from march
and it looks *really* good now. That last commit pushed up the amount of
lines right up to the amount as we had in march.

I also ran the compare-rulefiles script just like yesterday.

./compare-rulefiles -d 72_scores_20170315.cf 72_scores-1815405.cf >
deleted_rules.txt
./compare-rulefiles -r 0 -d deleted_rules.txt active-1815421.list >
deactivated_rules.txt
./compare-rulefiles -r 0 -a deactivated_rules.txt deleted_rules.txt >
disappeared_rules.txt

cat disappeared_rules.txt
ADVANCE_FEE_4_NEW
ADVANCE_FEE_5_NEW
CN_B2B_SPAMMER
URI_GOOGLE_PROXY

So thats only 4 rules which are not in our new scores file but which were
in the march one (discounting deactivated rules).

When looking at the changes between now and then, I see nothing
suspicious. i am now pretty confident the score generation is running as
before in march.

Anything which is not right, probably wasn't right in march either ;)

I would say, lets get people testing!

Here are the full changes between now and march so you can see for yourself:

./compare-rulefiles 72_scores_20170315.cf 72_scores-1815405.cf
Only in 1 (removed in 2)
ADVANCE_FEE_4_NEW
ADVANCE_FEE_5_NEW
AXB_XMAILER_MIMEOLE_OL_1ECD5
AXB_XM_FORGED_OL2600
BODY_EMPTY
CN_B2B_SPAMMER
FREEMAIL_DOC_PDF_BCC
FSL_HELO_BARE_IP_2
HDRS_LCASE
HK_SCAM_N15
LOTTO_AGENT
LOTTO_DEPT
MONEY_LOTTERY
MSGID_NOFQDN1
RP_MATCHES_RCVD
SHARE_50_50
TO_NO_BRKTS_FROM_MSSP
URI_GOOGLE_PROXY

Only in 2 (added in 2)
ADVANCE_FEE_4_NEW_MONEY
ADVANCE_FEE_5_NEW_FRM_MNY
ADVANCE_FEE_5_NEW_MONEY
APOSTROPHE_TOCC
AXB_X_AOL_SEZ_S
DEAR_BENEFICIARY
FROM_MISSP_DYNIP
FSL_HELO_FAKE
FSL_MIME_NO_TEXT
FUZZY_UNSUBSCRIBE
HDRS_MISSP
MANY_PILL_PRICE
MILLION_USD
MONEY_ATM_CARD
MONEY_FORM
MONEY_FORM_SHORT
MONEY_FROM_41
MONEY_FROM_MISSP
SERGIO_SUBJECT_VIAGRA01
SHORTENED_URL_SRC
SINGLETS_LOW_CONTRAST
SPOOFED_FREEM_REPTO_RUS
TO_NO_BRKTS_DYNIP

Changed
AC_HTML_NONSENSE_TAGS
  1.000 0.001 1.000 0.001
  1.000 1.000 1.000 1.000
ADVANCE_FEE_2_NEW_MONEY
  1.997 0.001 1.997 0.001
  0.001 0.020 0.001 0.020
ADVANCE_FEE_3_NEW
  3.496 0.001 3.496 0.001
  3.001 3.467 3.001 3.467
ADVANCE_FEE_3_NEW_MONEY
  2.796 0.001 2.796 0.001
  3.099 2.696 3.099 2.696
AXB_XMAILER_MIMEOLE_OL_024C2
  0.367 0.001 0.367 0.001
  1.816 0.006 1.816 0.006
BODY_URI_ONLY
  0.998 0.001 0.998 0.001
  1.000 0.999 1.000 0.999
BOGUS_MSM_HDRS
  0.909 0.001 0.909 0.001
  0.795 1.377 0.795 1.377
CANT_SEE_AD
  2.996 0.500 2.996 0.500
  1.000 1.000 1.000 1.000
CK_HELO_DYNAMIC_SPLIT_IP
  1.350 0.001 1.350 0.001
  1.500 0.107 1.500 0.107
CK_HELO_GENERIC
  0.249 0.249 0.249 0.249
  0.250 0.248 0.250 0.248
COMMENT_GIBBERISH
  1.498 1.499 1.498 1.499
  1.000 1.000 1.000 1.000
DATE_IN_FUTURE_96_Q
  3.296 3.299 3.296 3.299
  2.899 2.696 2.899 2.696
FBI_MONEY
  0.696 0.001 0.696 0.001
  1.000 1.000 1.000 1.000
FBI_SPOOF
  1.999 1.999 1.999 1.999
  1.000 1.000 1.000 1.000
FILL_THIS_FORM
  2.748 0.001 2.748 0.001
  0.113 1.488 0.113 1.488
FORM_FRAUD
  0.998 0.001 0.998 0.001
  1.000 0.998 1.000 0.998
FORM_FRAUD_3
  2.696 0.001 2.696 0.001
  2.899 0.999 2.899 0.999
FORM_FRAUD_5
  0.209 0.001 0.209 0.001
  3.499 1.594 3.499 1.594
FOUND_YOU
  3.013 0.001 3.013 0.001
  1.000 1.000 1.000 1.000
FREEMAIL_FORGED_FROMDOMAIN
  0.001 0.199 0.001 0.199
  0.001 0.001 0.001 0.001
FROM_IN_TO_AND_SUBJ
  0.287 0.262 0.287 0.262
  0.001 0.001 0.001 0.001
FROM_MISSP_FREEMAIL
  3.595 0.001 3.595 0.001
  2.213 1.781 2.213 1.781
FROM_MISSP_MSFT
  0.001 0.001 0.001 0.001
  1.097 1.596 1.097 1.596
FROM_MISSP_REPLYTO
  0.001 0.001 0.001 0.001
  2.443 0.001 2.443 0.001
FROM_MISSP_SPF_FAIL
  0.001 1.000 0.001 1.000
  0.001 0.001 0.001 0.001
FROM_MISSP_TO_UNDISC
  1.438 0.001 1.438 0.001
  1.472 0.448 1.472 0.448
FROM_MISSP_USER
  0.001 0.001 0.001 0.001
  3.316 1.188 3.316 1.188
FROM_MISSP_XPRIO
  0.001 0.001 0.001 0.001
  1.785 2.497 1.785 2.497
FROM_WORDY
  2.497 0.001 2.497 0.001
  2.500 2.498 2.500 2.498
FSL_CTYPE_WIN1251
  0.001 0.001 0.001 0.001
  3.515 3.080 3.515 3.080
FSL_NEW_HELO_USER
  0.083 0.001 0.083 0.001
  1.719 0.750 1.719 0.750
HELO_MISC_IP
  0.248 0.250 0.248 0.250
  0.250 0.249 0.250 0.249
HK_RANDOM_FROM
  0.998 0.001 0.998 0.001
  0.999 0.999 0.999 0.999
HK_SCAM_N2
  3.249 0.001 3.249 0.001
  1.498 2.696 1.498 2.696
IMG_DIRECT_TO_MX
  2.397 2.400 2.397 2.400
  3.599 1.744 3.599 1.744
LIST_PRTL_SAME_USER
  0.001 0.286 0.001 0.286
  1.000 1.000 1.000 1.000
LONG_HEX_URI
  2.194 2.290 2.194 2.290
  1.102 0.853 1.102 0.853
LONG_IMG_URI
  0.553 0.100 0.553 0.100
  0.554 1.000 0.554 1.000
LOTS_OF_MONEY
  0.001 0.001 0.001 0.001
  0.001 0.005 0.001 0.005
MIMEOLE_DIRECT_TO_MX
  1.445 0.381 1.445 0.381
  1.999 0.738 1.999 0.738
MIME_NO_TEXT
  1.000 1.000 1.000 1.000
  1.803 1.997 1.803 1.997
MONEY_FRAUD_3
  2.896 0.001 2.896 0.001
  3.099 0.263 3.099 0.263
MONEY_FRAUD_5
  

Re: 72_scores.cf compared to the one from march 15

2017-11-15 Thread Kevin A. McGrail

On 11/15/2017 4:43 PM, Dave Jones wrote:
I got my SVN authentication issue figured out on my laptop and 
committed these.  Fingers crossed for the run in about 5 hours. 

Excellent.  Sorry, today was an ASF board meeting so hectic!


Re: 72_scores.cf compared to the one from march 15

2017-11-15 Thread Dave Jones

On 11/15/2017 07:10 AM, Dave Jones wrote:

On 11/15/2017 06:40 AM, Kevin A. McGrail wrote:

On 11/15/2017 6:33 AM, Merijn van den Kroonenberg wrote:

That, or maybe Kevin can step in for now and do the commit for you?
Good to know you are on the road and thanks for still trying to help!


Happy to try and help!

Regards,

KAM


On the sa-vm1 server, I need to get these two files committed:

/usr/local/spamassassin/automc/svn/trunk/masses/rule-update-score-gen$ 
svn status

M   generate-new-scores.sh
M   lock-scores

I would normally copy these to /tmp then scp them down to my local 
desktop/laptop check out location to commit them.


The generate-new-scores.sh has the SVN $REVISION determined from the 
majority masscheck submissions and we think the lock-scores is the one 
that was running on the old server back in March but wasn't committed 
to the main dir like it should have been.


Dave


I got my SVN authentication issue figured out on my laptop and committed 
these.  Fingers crossed for the run in about 5 hours.


Dave



Re: 72_scores.cf compared to the one from march 15

2017-11-15 Thread Dave Jones

On 11/15/2017 06:40 AM, Kevin A. McGrail wrote:

On 11/15/2017 6:33 AM, Merijn van den Kroonenberg wrote:

That, or maybe Kevin can step in for now and do the commit for you?
Good to know you are on the road and thanks for still trying to help!


Happy to try and help!

Regards,

KAM


On the sa-vm1 server, I need to get these two files committed:

/usr/local/spamassassin/automc/svn/trunk/masses/rule-update-score-gen$ 
svn status

M   generate-new-scores.sh
M   lock-scores

I would normally copy these to /tmp then scp them down to my local 
desktop/laptop check out location to commit them.


The generate-new-scores.sh has the SVN $REVISION determined from the 
majority masscheck submissions and we think the lock-scores is the one 
that was running on the old server back in March but wasn't committed to 
the main dir like it should have been.


Dave


Re: 72_scores.cf compared to the one from march 15

2017-11-15 Thread Kevin A. McGrail

On 11/15/2017 6:33 AM, Merijn van den Kroonenberg wrote:

That, or maybe Kevin can step in for now and do the commit for you?
Good to know you are on the road and thanks for still trying to help!


Happy to try and help!

Regards,

KAM



Re: 72_scores.cf compared to the one from march 15

2017-11-15 Thread Merijn van den Kroonenberg
> On 11/15/2017 05:22 AM, Merijn van den Kroonenberg wrote:
>>> I updated the masses/rule-update-score-gen/lock-scores file from
>>> rulesrc/sandbox/dos/new-rule-score-gen/lock-scores on the
>>> sa-vm1.apache.org server so fingers crossed on the 72_scores.cf here in
>>> about 5 hours.
>> This script is always freshly checked out, so uncommitted changes can
>> never be tested. If you check automc/tmp you will see still the old
>> version of the script. I must admit, I fell for it too, only found out
>> after actually checking the temp dir to check the script after I
>> wondered
>> why there was no change in ranges.data.
>>
>>> Dave
>>>
>>>
>>
> Darn.  I am having problems with my SVN ID right now so I was hoping I
> didn't have to commit these changes to test them on the server.  I am
> travelling with my laptop that doesn't have something setup quite right
> so I will have to figure out the SVN authentication setup since I won't
> be back at my primary desktop PC for about 10 days.

That, or maybe Kevin can step in for now and do the commit for you?
Good to know you are on the road and thanks for still trying to help!

>
> Dave
>
>




Re: 72_scores.cf compared to the one from march 15

2017-11-15 Thread Dave Jones

On 11/15/2017 05:22 AM, Merijn van den Kroonenberg wrote:

I updated the masses/rule-update-score-gen/lock-scores file from
rulesrc/sandbox/dos/new-rule-score-gen/lock-scores on the
sa-vm1.apache.org server so fingers crossed on the 72_scores.cf here in
about 5 hours.

This script is always freshly checked out, so uncommitted changes can
never be tested. If you check automc/tmp you will see still the old
version of the script. I must admit, I fell for it too, only found out
after actually checking the temp dir to check the script after I wondered
why there was no change in ranges.data.


Dave




Darn.  I am having problems with my SVN ID right now so I was hoping I 
didn't have to commit these changes to test them on the server.  I am 
travelling with my laptop that doesn't have something setup quite right 
so I will have to figure out the SVN authentication setup since I won't 
be back at my primary desktop PC for about 10 days.


Dave



Re: 72_scores.cf compared to the one from march 15

2017-11-15 Thread Merijn van den Kroonenberg

> I updated the masses/rule-update-score-gen/lock-scores file from
> rulesrc/sandbox/dos/new-rule-score-gen/lock-scores on the
> sa-vm1.apache.org server so fingers crossed on the 72_scores.cf here in
> about 5 hours.

This script is always freshly checked out, so uncommitted changes can
never be tested. If you check automc/tmp you will see still the old
version of the script. I must admit, I fell for it too, only found out
after actually checking the temp dir to check the script after I wondered
why there was no change in ranges.data.

>
> Dave
>
>




Re: 72_scores.cf compared to the one from march 15

2017-11-14 Thread Merijn van den Kroonenberg
> On 11/14/2017 07:28 AM, Merijn van den Kroonenberg wrote:
 Hi,

 When I compare the current 72_scores.cf with the one from march 15 I
 can
 see
 we are getting closer and closer.
 The march one has 144 lines and the current one has 108.
>> Actually, personally I think below issue should be addressed before
>> going
>> live with the new score generation. Without it there still is a too big
>> of
>> a difference and I would not feel confident that some major issue is not
>> lurking below this. I understand you want to go live sooner rather than
>> later, but well these are my thoughts :)
>
> Last night's issue was my goof.  Yesterday's 72_scores.cf was much
> closer to March's size.

It has been hovering around 103 line for some time now. But still it used
to be 140-160 lines.

>
> Keep in mind that the size of the 72_scores.cf has fluctuated over the
> years so we aren't sure that it has to be the same number of lines that
> it was back in March to be correct now.  If you know something is still
> broken with the 72_scores.cf we can hold off and get it corrected.  Do
> you know of anything we need to address still?

Yes, thats why I do not only look at the amount of lines, but also do
checks on which lines are missing (or new) and why.

In the part below (which you cut off in this mail, but was still in my
previous mail) I found another change in the "dos" lock-scores which is
not in the actually used lock-scores. It is similar to what happend to the
merge-scores script.

I think with this fixed the 72_scores.cf will look much more like the
march one, so its much easier for me to check and explain any remaining
differences. So then I could see if suspicious rules are missing or added.

>
> I have installed yesterday's ruleset manually on my SA platforms and
> will check the scoring levels today and tomorrow.
>
> Dave
>

Due to the way I debug and solve problems I need to theorize them instead
of just testing. Thats why I really would like to have lock-scores fixed.





Re: 72_scores.cf compared to the one from march 15

2017-11-08 Thread Merijn van den Kroonenberg
> Hi,
>
> When I compare the current 72_scores.cf with the one from march 15 I can
> see
> we are getting closer and closer.
> The march one has 144 lines and the current one has 108.

I have been looking at this and by backtracking I see the lock-scores
script which has a definite impact on ranges.data and by that on which
rules are used by the garescorer.

Looking at the script I remembered I had already a note about this script.
Its also in
rulesrc/sandbox/dos/new-rule-score-gen/lock-scores
Which has been changed compared to the
masses/rule-update-score-gen/lock-scores:
version which we use now.
The changes seem to be related to assigning ranges to rules if they have
scores defined in the sandboxes.

I think its likely this was also running in production in march. So I
would like to see what happens if these changes are ported to
masses/rule-update-score-gen/lock-scores. (must be committed to svn for
testing).

When I have some time I want to make a write up of which rules are
considered for score generation and what happens if scores are not
generated for rules. Probably need to have a good look at what the
intention should be, after we have updates running again.

>
> When looking at the rules which are missing, then one case stands out
> clearly:
> All rules in the march version with a score like this:
> 1.000 1.000 1.000 1.000
> Are missing from our current 72_scores.cf
> [edit: they all seem to be in active.list with a tflags publish]
>
> I will see if I can find where they get lost ;)
>
> One other rule which is still missing is RP_MATCHES_RCVD, which i could
> imagine being used in custom meta rules.
>
> So I compile a list of all rules in the March 72_scores.cf which are not
> in
> our current:
>
> AC_SPAMMY_URI_PATTERNS1
> AC_SPAMMY_URI_PATTERNS10
> AC_SPAMMY_URI_PATTERNS11
> AC_SPAMMY_URI_PATTERNS12
> AC_SPAMMY_URI_PATTERNS2
> AC_SPAMMY_URI_PATTERNS3
> AC_SPAMMY_URI_PATTERNS4
> AC_SPAMMY_URI_PATTERNS8
> AC_SPAMMY_URI_PATTERNS9
> AXB_XMAILER_MIMEOLE_OL_1ECD5
> AXB_XM_FORGED_OL2600
> BODY_EMPTY
> CANT_SEE_AD
> CN_B2B_SPAMMER
> COMMENT_GIBBERISH
> ENCRYPTED_MESSAGE
> FORM_LOW_CONTRAST
> FOUND_YOU
> FREEMAIL_DOC_PDF_BCC
> FROM_WORDY_SHORT
> FSL_HELO_BARE_IP_2
> GOOGLE_DOCS_PHISH
> GOOGLE_DOCS_PHISH_MANY
> GOOG_MALWARE_DNLD
> HDRS_LCASE
> HEXHASH_WORD
> HK_SCAM_N15
> HTML_OFF_PAGE
> LIST_PRTL_PUMPDUMP
> LIST_PRTL_SAME_USER
> LOTTO_AGENT
> LOTTO_DEPT
> LUCRATIVE
> MIME_NO_TEXT
> MONEY_LOTTERY
> MSGID_NOFQDN1
> MSM_PRIO_REPTO
> PHP_NOVER_MUA
> PHP_ORIG_SCRIPT
> PHP_SCRIPT_MUA
> PP_TOO_MUCH_UNICODE02
> PP_TOO_MUCH_UNICODE05
> PUMPDUMP
> PUMPDUMP_MULTI
> RAND_HEADER_MANY
> RP_MATCHES_RCVD
> SHARE_50_50
> SPOOFED_FREEM_REPTO_CHN
> STOCK_LOW_CONTRAST
> STOCK_TIP
> SYSADMIN
> TO_NO_BRKTS_PCNT
> TW_GIBBERISH_MANY
> UC_GIBBERISH_OBFU
> URI_DATA
> URI_OPTOUT_3LD
> XPRIO_SHORT_SUBJ
>
> Which are 57 rules, more than the difference in rulecount. This means
> there
> are also many rules in our current 72_scores.cf which are not in the march
> version.
>
> Can someone explain to me why or in which cases rules are added or removed
> from the 72_scores.cf?
>
> What I already know:
> 1) during rule promotion rules are added/removed frome active.list which
> in
> turn will add/remove them from 72_scores.cf


2) when the hitrate in corpus falls below 0.01% they are removed too it
seems. So this also depends on absolute corpus size. In this case they get
the default score. (which also sounds weird to me)

>
> A few from the above list of rules can be tracked to active.list changes
> (rule promotions) between then and now. But most are still in active.list.
>
> Cheers,
> Merijn
>
>