On Wed, May 05, 2021 at 09:24:37AM +0200, [email protected] wrote: > Hello > > I'm new to masscheck, nothing uploaded yet, and have two questions > > As my spam corpus comes from my traps and my ham "just" from my personal > addresses there is quite an imbalance between my spam- and ham corpus > (300 ham and several k's of spam). Is such an imbalance a problem for > reliable masscheck?
Personal ham/spam counts are irrelevant as masscheck processes all the corpuses together. You can see https://ruleqa.spamassassin.org/ that there are many "spam-only" corpuses etc. > Second: I tested masscheck script with my config but I get a warning > which I'm not sure it can be ignored or not: > > archive-iterator: invalid (undef) format in target list, run_masscheck > at > /root/masscheckwork/nightly_mass_check/masses/../lib/Mail/SpamAssassin/ArchiveIterator.pm > line 545. > archive-iterator: invalid (undef) format in target list, ham-corpus at > /root/masscheckwork/nightly_mass_check/masses/../lib/Mail/SpamAssassin/ArchiveIterator.pm > line 545. > > but according to my config the ham-corpus is defined > > run_all_masschecks() { > ### sample: single corpus ### > run_masscheck spam-corpus --all \ > --after=-4838400 spam:dir:/data/archive/spam/ \ > run_masscheck ham-corpus --all \ > --after=-174182400 ham:dir:/data/archive/ham/ > } Why did you split run_masscheck in two? I think mass-check requires defining both spam/ham always. As per automasscheck-minimal.cf.dist it should look like this: run_all_masschecks() { ### sample: single corpus ### run_masscheck spam-corpus --all \ --after=-4838400 spam:dir:/data/archive/spam/ \ --after=-174182400 ham:dir:/data/archive/ham/ }
