On 2018-03-20 13:27, Dave Jones wrote:
On 03/20/2018 01:19 PM, Kevin A. McGrail wrote:
That is an interesting question because you are right, they are supposed to
be immutable.  Dave, is something happening in an 18 hour window as he
describes?


From what I learned trying to reconstruct everything about 10 months ago there are 2 updates based on the SVN commit numbers that should generate 2 sets of files that are immutable.  They are from the rule promotions and the nightly masscheck jobs which are like a tick-tock working together.

The tick is the rule promotion that has a dependency on 3 days of good masschecks.  It's 72_scores.cf will be behind based on what is currently in the SVN trunk when the cron job runs.  It used to not have a 72_scores.cf in it but I added it into the script logic once I discovered it was missing.  The reason behind this is rules are no longer distributed with SA so a first sa-update needs to be run.  If the latest ruleset didn't have a 72_scores.cf on a fresh installation, then that could be really bad/off on scoring until the tock cycle of sa-update.

The tock is the night masscheck jobs that have several dependencies like enough contributors and number of ham/spam in the corpora.  This generates a new 72_scores.cf based on the garescorer C program.

This is why it takes at least 2 days (tick, tock) for any rule updates to roll out.  If we had more development activity with rules that needed to go out faster, I would like to get to maybe an 8 hour (2 4-hour runs) cycle.  We just don't have the need at this point in time.

From a rule generation point of view, this makes some sense, thanks.

However, the result currently is that both tick and tock publish rulesets with the same number (different 72_scores.cf files), so the reality is that there is no immutable "ruleset 1827165", and which set you get depends on the time of day sa-update happens to run.

A lot of the scores have changed a fair amount, there are several with a 0.5-1 point difference, and a few larger than that when comparing the two versions of 1827165.

Are these intended to publish on alternating days? Or if twice a day (which is the current situation) why isn't the number being incremented in some fashion?

Reply via email to