On Wed, May 05, 2021 at 09:24:37AM +0200, [email protected] wrote:
> Hello
> 
> I'm new to masscheck, nothing uploaded yet, and have two questions
> 
> As my spam corpus comes from my traps and my ham "just" from my personal
> addresses there is quite an imbalance between my spam- and ham corpus
> (300 ham and several k's of spam). Is such an imbalance a problem for
> reliable masscheck?

Personal ham/spam counts are irrelevant as masscheck processes all the
corpuses together.  You can see https://ruleqa.spamassassin.org/ that there
are many "spam-only" corpuses etc.


> Second: I tested masscheck script with my config but I get a warning
> which I'm not sure it can be ignored or not:
> 
> archive-iterator: invalid (undef) format in target list, run_masscheck
> at
> /root/masscheckwork/nightly_mass_check/masses/../lib/Mail/SpamAssassin/ArchiveIterator.pm
> line 545.
> archive-iterator: invalid (undef) format in target list, ham-corpus at
> /root/masscheckwork/nightly_mass_check/masses/../lib/Mail/SpamAssassin/ArchiveIterator.pm
> line 545.
> 
> but according to my config the ham-corpus is defined
> 
> run_all_masschecks() {
>   ### sample: single corpus ###
>   run_masscheck spam-corpus --all \
>           --after=-4838400 spam:dir:/data/archive/spam/ \
>   run_masscheck ham-corpus --all \
>           --after=-174182400 ham:dir:/data/archive/ham/
> }

Why did you split run_masscheck in two?  I think mass-check requires
defining both spam/ham always.

As per automasscheck-minimal.cf.dist it should look like this:

run_all_masschecks() {
  ### sample: single corpus ###
  run_masscheck spam-corpus --all \
          --after=-4838400 spam:dir:/data/archive/spam/ \
          --after=-174182400 ham:dir:/data/archive/ham/
}

Reply via email to