On Thu, 17 May 2018, Dave Jones wrote:

On 05/17/2018 04:09 PM, John Hardin wrote:

I notice from the RuleQA website that the masschecks from giovanni and llanga are consistently reported separately from everybody else's.

I wonder whether this is affecting the quality of masscheck - is this perhaps causing it to bounce back and forth between scores (or do something else that's suboptimal) based on what appears to it to be two separate and different masscheck corpora?

It looks like they are always behind on the SVN revision pulled down so they are actually not being counted/included in the masscheck processing.

The ena-week* corpora is the majority of the masscheck data.

Is this because of how those masschecks are being run or submitted, or is the masscheck infrastructure too strict (filename matching, submission cutoffs, etc.)?

Maybe they are running old versions of the automasscheck script that has some sort of delay in it between the downloading of the SVN staging area and the masscheck local processing. I understand that some may want to delay processing until electricity costs. The downloading of the staging area needs to happen per the documentation to get the correct SVN version of rules else it's a waste of resources.

It *feels* like those result sets are not coming in by the cutoff.

It's the SVN revision that is making them show up behind in their own section.

The RuleQA UI reports the same DateRev (SVN revision) as everybody else. See below.

Unfortunately the ruleQA website doesn't expose a tool to report when the results were submitted, just which DateRev the masscheck was based on.

I'd say "perhaps we need to extend the cutoff a bit", but I have no idea ATM when those result sets are coming in so I have no idea how the cutoff would be adjusted.

I have a script that runs on sa-vm1 to track submissions so the sysadmins list gets notifications when we don't have enough for masscheck to run. Here's the output right now -- note the No's that don't match the SVN tagged rev:

Yes, but that's not clear as to whether they are pulling the wrong rev or are just late.

SVN tagged rev in nightly_mass_check:  1831759

New masscheck submission listings in the past day:
  1831759 (Yes) - spam-axb-ham-misc.log (May 17 11:35)
  1831684  (No) - ham-giovanni.log (May 17 08:38)
  1831684  (No) - spam-llanga.log (May 17 08:45)

Based on the RuleQA daterev list (at the top of the page), 1831684 *does* appear to be a valid masscheck daterev (apologies for the textual "screenshot"):


1831684: 2018-05-16 08:34:16
spamassassin_role: promotions validated

20180516-r1831684-n
axb-coi-bulk axb-generic axb-ham-misc axb-ninja darxus ena-week0 ena-week1 ena-week2 ena-week3 ena-week4 giovanni jarif jbrooks llanga mmiroslaw-mails-ham mmiroslaw-mails-spam sihde

1831684: 2018-05-16 08:34:16
spamassassin_role: promotions validated

20180517-r1831684-n
giovanni llanga

1831759: 2018-05-17 08:34:10
spamassassin_role: promotions validated

20180517-r1831759-n (Viewing)
axb-coi-bulk axb-generic axb-ham-misc axb-ninja darxus ena-week0 ena-week1 ena-week3 ena-week4 grenier jarif jbrooks mmiroslaw-mails-ham mmiroslaw-mails-spam sihde thendrikx


You don't see that normally because the default "last two" in the UI is usually the current submissions from everybody else, preceded by the apparently-late submission for the *prior* rdaterev from from giovanni and llanga. You have to hit "all daterevs within 2 days" to see more history.


It looks to me like their submissions are for the correct (prior) daterev (SVN commit) but are coming in ~20H late... I don't think we can tweak the cutoff *that* much. :)

I would be surprised if their masschecks were taking that long to complete. Is it possible they have something like a TZ error causing that much of a discrepancy?


--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  It's easy to be noble with other people's money.
                                   -- John McKay, _The Welfare State:
                                      No Mercy for the Middle Class_
-----------------------------------------------------------------------
 413 days since the first commercial re-flight of an orbital booster (SpaceX)

Reply via email to