On 06/04/2017 01:18 PM, Jari Fredriksson wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Cool. I have been doing masschecks somewhere at 1700UTC but now changed
it to take place at 1200EET. My corpus takes 3-4 hours on 4 core (Core
i7 920) or 1/2h at Google Compute @32 cores and ramdisk. I'm not looking
much at the planned hourly submission, but it remains to be seen, what
would it be.
Ideally I would want to do this at night time when the electricity is
cheap...
My 'ena' corpus is now up 85K (28K spam/57K ham), growing about 10-12K a
day and scoring consistently on the ham/spam rule hits:
Rule hit frequencies:
OVERALL SPAM HAM NAME
85428 28358 57070 (all messages)
8647 8643 4 URI_WP_HACKED
3914 3914 0 HELO_MISC_IP
3138 3136 2 DATE_IN_FUTURE_06_12
2897 2896 1 T_PDS_TO_EQ_FROM_NAME
3105 3102 3 T_PDS_FROM_2_EMAILS
2731 2729 2 DRUGS_ERECTILE
2415 2415 0 URI_ONLY_MSGID_MALF
2402 2402 0 DOS_OE_TO_MX
2509 2507 2 LONGWORDS
1928 1928 0 DRUGS_ERECTILE_OBFU
3644 3622 22 MIMEOLE_DIRECT_TO_MX
1820 1817 3 MISSING_SUBJECT
1657 1657 0 FUZZY_PHARMACY
1648 1648 0 DOS_OUTLOOK_TO_MX
2709 2693 16 T_NAME_EMAIL_DIFF
1545 1544 1 DATE_IN_FUTURE_03_06
1514 1514 0 MISSING_MIME_HB_SEP
1142 1142 0 SUBJECT_DRUG_GAP_L
1128 1127 1 FUZZY_PRICES
1064 1064 0 SUBJECT_DRUG_GAP_C
My masscheck processing is taking about 2 hours on my 4 core VM.
Question about the URI_WP_HACKED rule. Why is it still at the default
of 1.0 since it's S/O on http://ruleqa.spamassassin.org has been 1.000
for a long time?
What sets the default scores in 50_scores.cf and what determines goes
into the nightly 72_scores.cf? Is there still something I need to find
and get running again on the new server?
--
Dave