On Thu, 17 May 2018, Dave Jones wrote:
On 05/17/2018 04:09 PM, John Hardin wrote:
I notice from the RuleQA website that the masschecks from giovanni and
llanga are consistently reported separately from everybody else's.
I wonder whether this is affecting the quality of masscheck - is this
perhaps causing it to bounce back and forth between scores (or do something
else that's suboptimal) based on what appears to it to be two separate and
different masscheck corpora?
It looks like they are always behind on the SVN revision pulled down so they
are actually not being counted/included in the masscheck processing.
The ena-week* corpora is the majority of the masscheck data.
Is this because of how those masschecks are being run or submitted, or is
the masscheck infrastructure too strict (filename matching, submission
cutoffs, etc.)?
Maybe they are running old versions of the automasscheck script that has some
sort of delay in it between the downloading of the SVN staging area and the
masscheck local processing. I understand that some may want to delay
processing until electricity costs. The downloading of the staging area
needs to happen per the documentation to get the correct SVN version of rules
else it's a waste of resources.
It *feels* like those result sets are not coming in by the cutoff.
It's the SVN revision that is making them show up behind in their own
section.
The RuleQA UI reports the same DateRev (SVN revision) as everybody else.
See below.
Unfortunately the ruleQA website doesn't expose a tool to report when the
results were submitted, just which DateRev the masscheck was based on.
I'd say "perhaps we need to extend the cutoff a bit", but I have no idea
ATM when those result sets are coming in so I have no idea how the cutoff
would be adjusted.
I have a script that runs on sa-vm1 to track submissions so the sysadmins
list gets notifications when we don't have enough for masscheck to run.
Here's the output right now -- note the No's that don't match the SVN tagged
rev:
Yes, but that's not clear as to whether they are pulling the wrong rev or
are just late.
SVN tagged rev in nightly_mass_check: 1831759
New masscheck submission listings in the past day:
1831759 (Yes) - spam-axb-ham-misc.log (May 17 11:35)
1831684 (No) - ham-giovanni.log (May 17 08:38)
1831684 (No) - spam-llanga.log (May 17 08:45)
Based on the RuleQA daterev list (at the top of the page), 1831684 *does*
appear to be a valid masscheck daterev (apologies for the textual
"screenshot"):
1831684: 2018-05-16 08:34:16
spamassassin_role: promotions validated
20180516-r1831684-n
axb-coi-bulk axb-generic axb-ham-misc axb-ninja darxus ena-week0 ena-week1
ena-week2 ena-week3 ena-week4 giovanni jarif jbrooks llanga
mmiroslaw-mails-ham mmiroslaw-mails-spam sihde
1831684: 2018-05-16 08:34:16
spamassassin_role: promotions validated
20180517-r1831684-n
giovanni llanga
1831759: 2018-05-17 08:34:10
spamassassin_role: promotions validated
20180517-r1831759-n (Viewing)
axb-coi-bulk axb-generic axb-ham-misc axb-ninja darxus ena-week0 ena-week1
ena-week3 ena-week4 grenier jarif jbrooks mmiroslaw-mails-ham
mmiroslaw-mails-spam sihde thendrikx
You don't see that normally because the default "last two" in the UI is
usually the current submissions from everybody else, preceded by the
apparently-late submission for the *prior* rdaterev from from giovanni and
llanga. You have to hit "all daterevs within 2 days" to see more history.
It looks to me like their submissions are for the correct (prior) daterev
(SVN commit) but are coming in ~20H late... I don't think we can tweak the
cutoff *that* much. :)
I would be surprised if their masschecks were taking that long to
complete. Is it possible they have something like a TZ error causing that
much of a discrepancy?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
It's easy to be noble with other people's money.
-- John McKay, _The Welfare State:
No Mercy for the Middle Class_
-----------------------------------------------------------------------
413 days since the first commercial re-flight of an orbital booster (SpaceX)