On 06/04/2017 01:18 PM, Jari Fredriksson wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cool. I have been doing masschecks somewhere at 1700UTC but now changed
it to take place at 1200EET. My corpus takes 3-4 hours on 4 core (Core
i7 920) or 1/2h at Google Compute @32 cores and ramdisk. I'm not looking
much at the planned hourly submission, but it remains to be seen, what
would it be.

Ideally I would want to do this at night time when the electricity is
cheap...

My 'ena' corpus is now up 85K (28K spam/57K ham), growing about 10-12K a day and scoring consistently on the ham/spam rule hits:

Rule hit frequencies:
   OVERALL        SPAM         HAM  NAME
     85428       28358       57070  (all messages)
      8647        8643           4  URI_WP_HACKED
      3914        3914           0  HELO_MISC_IP
      3138        3136           2  DATE_IN_FUTURE_06_12
      2897        2896           1  T_PDS_TO_EQ_FROM_NAME
      3105        3102           3  T_PDS_FROM_2_EMAILS
      2731        2729           2  DRUGS_ERECTILE
      2415        2415           0  URI_ONLY_MSGID_MALF
      2402        2402           0  DOS_OE_TO_MX
      2509        2507           2  LONGWORDS
      1928        1928           0  DRUGS_ERECTILE_OBFU
      3644        3622          22  MIMEOLE_DIRECT_TO_MX
      1820        1817           3  MISSING_SUBJECT
      1657        1657           0  FUZZY_PHARMACY
      1648        1648           0  DOS_OUTLOOK_TO_MX
      2709        2693          16  T_NAME_EMAIL_DIFF
      1545        1544           1  DATE_IN_FUTURE_03_06
      1514        1514           0  MISSING_MIME_HB_SEP
      1142        1142           0  SUBJECT_DRUG_GAP_L
      1128        1127           1  FUZZY_PRICES
      1064        1064           0  SUBJECT_DRUG_GAP_C

My masscheck processing is taking about 2 hours on my 4 core VM.

Question about the URI_WP_HACKED rule. Why is it still at the default of 1.0 since it's S/O on http://ruleqa.spamassassin.org has been 1.000 for a long time?

What sets the default scores in 50_scores.cf and what determines goes into the nightly 72_scores.cf? Is there still something I need to find and get running again on the new server?

--
Dave

Reply via email to