> Hi, > > When I compare the current 72_scores.cf with the one from march 15 I can > see > we are getting closer and closer. > The march one has 144 lines and the current one has 108.
I have been looking at this and by backtracking I see the lock-scores script which has a definite impact on ranges.data and by that on which rules are used by the garescorer. Looking at the script I remembered I had already a note about this script. Its also in rulesrc/sandbox/dos/new-rule-score-gen/lock-scores Which has been changed compared to the masses/rule-update-score-gen/lock-scores: version which we use now. The changes seem to be related to assigning ranges to rules if they have scores defined in the sandboxes. I think its likely this was also running in production in march. So I would like to see what happens if these changes are ported to masses/rule-update-score-gen/lock-scores. (must be committed to svn for testing). When I have some time I want to make a write up of which rules are considered for score generation and what happens if scores are not generated for rules. Probably need to have a good look at what the intention should be, after we have updates running again. > > When looking at the rules which are missing, then one case stands out > clearly: > All rules in the march version with a score like this: > 1.000 1.000 1.000 1.000 > Are missing from our current 72_scores.cf > [edit: they all seem to be in active.list with a tflags publish] > > I will see if I can find where they get lost ;) > > One other rule which is still missing is RP_MATCHES_RCVD, which i could > imagine being used in custom meta rules. > > So I compile a list of all rules in the March 72_scores.cf which are not > in > our current: > > AC_SPAMMY_URI_PATTERNS1 > AC_SPAMMY_URI_PATTERNS10 > AC_SPAMMY_URI_PATTERNS11 > AC_SPAMMY_URI_PATTERNS12 > AC_SPAMMY_URI_PATTERNS2 > AC_SPAMMY_URI_PATTERNS3 > AC_SPAMMY_URI_PATTERNS4 > AC_SPAMMY_URI_PATTERNS8 > AC_SPAMMY_URI_PATTERNS9 > AXB_XMAILER_MIMEOLE_OL_1ECD5 > AXB_XM_FORGED_OL2600 > BODY_EMPTY > CANT_SEE_AD > CN_B2B_SPAMMER > COMMENT_GIBBERISH > ENCRYPTED_MESSAGE > FORM_LOW_CONTRAST > FOUND_YOU > FREEMAIL_DOC_PDF_BCC > FROM_WORDY_SHORT > FSL_HELO_BARE_IP_2 > GOOGLE_DOCS_PHISH > GOOGLE_DOCS_PHISH_MANY > GOOG_MALWARE_DNLD > HDRS_LCASE > HEXHASH_WORD > HK_SCAM_N15 > HTML_OFF_PAGE > LIST_PRTL_PUMPDUMP > LIST_PRTL_SAME_USER > LOTTO_AGENT > LOTTO_DEPT > LUCRATIVE > MIME_NO_TEXT > MONEY_LOTTERY > MSGID_NOFQDN1 > MSM_PRIO_REPTO > PHP_NOVER_MUA > PHP_ORIG_SCRIPT > PHP_SCRIPT_MUA > PP_TOO_MUCH_UNICODE02 > PP_TOO_MUCH_UNICODE05 > PUMPDUMP > PUMPDUMP_MULTI > RAND_HEADER_MANY > RP_MATCHES_RCVD > SHARE_50_50 > SPOOFED_FREEM_REPTO_CHN > STOCK_LOW_CONTRAST > STOCK_TIP > SYSADMIN > TO_NO_BRKTS_PCNT > TW_GIBBERISH_MANY > UC_GIBBERISH_OBFU > URI_DATA > URI_OPTOUT_3LD > XPRIO_SHORT_SUBJ > > Which are 57 rules, more than the difference in rulecount. This means > there > are also many rules in our current 72_scores.cf which are not in the march > version. > > Can someone explain to me why or in which cases rules are added or removed > from the 72_scores.cf? > > What I already know: > 1) during rule promotion rules are added/removed frome active.list which > in > turn will add/remove them from 72_scores.cf 2) when the hitrate in corpus falls below 0.01% they are removed too it seems. So this also depends on absolute corpus size. In this case they get the default score. (which also sounds weird to me) > > A few from the above list of rules can be tracked to active.list changes > (rule promotions) between then and now. But most are still in active.list. > > Cheers, > Merijn > >