Re: 72_scores.cf compared to the one from march 15

Merijn van den Kroonenberg Wed, 08 Nov 2017 08:49:28 -0800

> Hi,
>
> When I compare the current 72_scores.cf with the one from march 15 I can
> see
> we are getting closer and closer.
> The march one has 144 lines and the current one has 108.


I have been looking at this and by backtracking I see the lock-scores
script which has a definite impact on ranges.data and by that on which
rules are used by the garescorer.

Looking at the script I remembered I had already a note about this script.
Its also in
rulesrc/sandbox/dos/new-rule-score-gen/lock-scores
Which has been changed compared to the
masses/rule-update-score-gen/lock-scores:
version which we use now.
The changes seem to be related to assigning ranges to rules if they have
scores defined in the sandboxes.

I think its likely this was also running in production in march. So I
would like to see what happens if these changes are ported to
masses/rule-update-score-gen/lock-scores. (must be committed to svn for
testing).

When I have some time I want to make a write up of which rules are
considered for score generation and what happens if scores are not
generated for rules. Probably need to have a good look at what the
intention should be, after we have updates running again.

>
> When looking at the rules which are missing, then one case stands out
> clearly:
> All rules in the march version with a score like this:
> 1.000 1.000 1.000 1.000
> Are missing from our current 72_scores.cf
> [edit: they all seem to be in active.list with a tflags publish]
>
> I will see if I can find where they get lost ;)
>
> One other rule which is still missing is RP_MATCHES_RCVD, which i could
> imagine being used in custom meta rules.
>
> So I compile a list of all rules in the March 72_scores.cf which are not
> in
> our current:
>
> AC_SPAMMY_URI_PATTERNS1
> AC_SPAMMY_URI_PATTERNS10
> AC_SPAMMY_URI_PATTERNS11
> AC_SPAMMY_URI_PATTERNS12
> AC_SPAMMY_URI_PATTERNS2
> AC_SPAMMY_URI_PATTERNS3
> AC_SPAMMY_URI_PATTERNS4
> AC_SPAMMY_URI_PATTERNS8
> AC_SPAMMY_URI_PATTERNS9
> AXB_XMAILER_MIMEOLE_OL_1ECD5
> AXB_XM_FORGED_OL2600
> BODY_EMPTY
> CANT_SEE_AD
> CN_B2B_SPAMMER
> COMMENT_GIBBERISH
> ENCRYPTED_MESSAGE
> FORM_LOW_CONTRAST
> FOUND_YOU
> FREEMAIL_DOC_PDF_BCC
> FROM_WORDY_SHORT
> FSL_HELO_BARE_IP_2
> GOOGLE_DOCS_PHISH
> GOOGLE_DOCS_PHISH_MANY
> GOOG_MALWARE_DNLD
> HDRS_LCASE
> HEXHASH_WORD
> HK_SCAM_N15
> HTML_OFF_PAGE
> LIST_PRTL_PUMPDUMP
> LIST_PRTL_SAME_USER
> LOTTO_AGENT
> LOTTO_DEPT
> LUCRATIVE
> MIME_NO_TEXT
> MONEY_LOTTERY
> MSGID_NOFQDN1
> MSM_PRIO_REPTO
> PHP_NOVER_MUA
> PHP_ORIG_SCRIPT
> PHP_SCRIPT_MUA
> PP_TOO_MUCH_UNICODE02
> PP_TOO_MUCH_UNICODE05
> PUMPDUMP
> PUMPDUMP_MULTI
> RAND_HEADER_MANY
> RP_MATCHES_RCVD
> SHARE_50_50
> SPOOFED_FREEM_REPTO_CHN
> STOCK_LOW_CONTRAST
> STOCK_TIP
> SYSADMIN
> TO_NO_BRKTS_PCNT
> TW_GIBBERISH_MANY
> UC_GIBBERISH_OBFU
> URI_DATA
> URI_OPTOUT_3LD
> XPRIO_SHORT_SUBJ
>
> Which are 57 rules, more than the difference in rulecount. This means
> there
> are also many rules in our current 72_scores.cf which are not in the march
> version.
>
> Can someone explain to me why or in which cases rules are added or removed
> from the 72_scores.cf?
>
> What I already know:
> 1) during rule promotion rules are added/removed frome active.list which
> in
> turn will add/remove them from 72_scores.cf


2) when the hitrate in corpus falls below 0.01% they are removed too it
seems. So this also depends on absolute corpus size. In this case they get
the default score. (which also sounds weird to me)

>
> A few from the above list of rules can be tracked to active.list changes
> (rule promotions) between then and now. But most are still in active.list.
>
> Cheers,
> Merijn
>
>

Re: 72_scores.cf compared to the one from march 15

Reply via email to