On Tue, Dec 11, 2012 at 10:53:35AM +0100, Axb wrote:
> On 12/11/2012 09:43 AM, Marc Andre Selig wrote:

> >For example,
> ><http://ruleqa.spamassassin.org/20121210-r1419267-n/HK_RANDOM_FROM/detail?s_corpus=1>
> >shows 286388 spam messages in corpus axb-foo from month 2012-11.
> >This alone is much more than the minimum number required.  (I hope they
> >are messages collected from many different recipients so as not to bias
> >things, but that's a different matter.)
> >
> >So is the problem that axb's messages are reported too late?
> 
> I don't think so - I aborted all my masschecks and others weren't
> finished within the time frame.
> 
> >In that case, and if the premise holds that overaged messages are not
> >to be used, it might help for axb to simply delete messages that are
> >too old anyway, just so that mass-check can finish earlier.
> 
> To old? my spam corpus isn't older than 90 days. I just have too
> much of the crap.

You are perfectly right of course; I was looking at the dates instead of
the numbers.  You do have some very old ham (in sa-users and ham-misc),
but the numbers are so little as not to be relevant.


So why does it disregard your files?


> >A simpler option would be to modify the auto-mass-check.sh script to
> >use incremental uploads, instead of uploading all log files after all
> >corpuses have been checked.  To that end, it should suffice to add
> >the -t flag to rsync (so that files are not transferred twice) and add
> >invocations of upload_results to ~/.auto-mass-check.cf.
> 
> logs aren't cumultative/incremental. They're re-written on every
> masshceck run.
> 
> or do you mean something else?

Yes.  Several submitters have more than one corpus defined (but
you're currently the only one where it matters at all).  The way
auto-mass-check.sh is written, log files are uploaded only when all
corpusses are done.  In cases where mass-check runs slow, it might gain
some time to upload every time mass-check finishes a single corpus.

For example, your last net run was submitted 24 hours after the
weekly-versions.txt file was created.  Looking at the size of the files,
the first log file was probably done much earlier than that.  The more
corpusses you have defined, and the more your messages are spread out
among corpusses, the more distinct this effect becomes.

Regards
Marc

Reply via email to