Just when I thought I had whitelist and AWL figured out... I just got this (headers edited down):
--- begin headers --- [...] Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm List-Id: <incidents.list-id.securityfocus.com> List-Post: <mailto:[EMAIL PROTECTED]> List-Help: <mailto:[EMAIL PROTECTED]> List-Unsubscribe: <mailto:[EMAIL PROTECTED]> List-Subscribe: <mailto:[EMAIL PROTECTED]> Delivered-To: mailing list [EMAIL PROTECTED] Delivered-To: moderator for [EMAIL PROTECTED] Received: (qmail 30206 invoked from network); 27 Feb 2004 12:06:12 -0000 To: <[EMAIL PROTECTED]> From: "harley" <[EMAIL PROTECTED]> Date: Sat, 28 Feb 2004 01:55:39 GMT Message-Id: <[EMAIL PROTECTED]> Sender: [EMAIL PROTECTED] Subject: This Drug puts VlAGRA to shame!! Content-Type: text/plain; X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.ttlexceeded.com X-Spam-Level: X-Spam-Status: No, hits=-1.3 required=5.0 tests=BAYES_44,DATE_IN_FUTURE_06_12,FORGED_HOTMAIL_RCVD2,FVGT_u_HAS_2LETTERFLDR ,J_CHICKENPOX_14,LOCAL_DRUGS_MALEDYSFUNCTION,LOCAL_DRUGS_MALEDYSFUNCTION_OBFU,O ACYS_DISGUISED_P0RN,USER_IN_DEF_WHITELIST,_YM_HS_BAGLE_A autolearn=no version=2.63 X-Spam-Pyzor: Reported 0 times. X-Spam-Report: * 1.0 _YM_HS_BAGLE_A Subject =~ /Hi/i * 6.0 OACYS_DISGUISED_P0RN BODY: Tries to slip through filters by substituting numbers for letters * 0.6 J_CHICKENPOX_14 BODY: {1}Letter - punctuation - {4}Letter * -0.0 BAYES_44 BODY: Bayesian spam probability is 44 to 50% * [score: 0.4999] * 0.1 FVGT_u_HAS_2LETTERFLDR URI: FVGT - URL has a 2 letter folder like /ab/ * 2.0 DATE_IN_FUTURE_06_12 Date: is 6 to 12 hours after Received: date * -15 USER_IN_DEF_WHITELIST From: address is in the default white-list * 2.5 FORGED_HOTMAIL_RCVD2 hotmail.com 'From' address, but no 'Received:' * 0.5 LOCAL_DRUGS_MALEDYSFUNCTION_OBFU LOCAL_DRUGS_MALEDYSFUNCTION_OBFU * 1.0 LOCAL_DRUGS_MALEDYSFUNCTION LOCAL_DRUGS_MALEDYSFUNCTION --- end headers --- Notice that X-Spam Report: * -15 USER_IN_DEF_WHITELIST From: address is in the default white-list The LIST address IS in the default whitelist (/usr/share/spamassassin/60_whitelist.cf): 60_whitelist.cf:def_whitelist_from_rcvd [EMAIL PROTECTED] securityfocus.com I've double-checked, but neither securityfocus.com, nor that sender address are listed anywhere else under /usr/share/spamassassin, /etc/spamassassin or ~spamd/.spamassassin. Many securityfocs addresses are in spamd's AWL list from previous posts, but not this (non-securityfocus) address. The fields remotely like From: seem to be: From: "harley" <[EMAIL PROTECTED]> Message-Id: <[EMAIL PROTECTED]> Sender: [EMAIL PROTECTED] On a second pass, it rose to 6.2, so AWL is correcting it (and thus eventually will fix the wayward mystery whitelist entry), and AWL seems to have forgotten the original whitelist score: # check_whitelist | grep [EMAIL PROTECTED] 6.2 (6.2/2) -- [EMAIL PROTECTED]|ip=205.206 (-7.5 AWL AWL: Auto-whitelist adjustment) So AWL knows it was From: that account. It's figured out the message is spammy-scoring (trending towards higher scores). But how did it give it a non-spam STARTING value so high when there was no previous entry for that user? It seems AWL defaulted to -30 (-30/0) -- [EMAIL PROTECTED], then corrected when the first actual message was posted. To test, I: 1. Edited /usr/share/spamassassin/60_whitelist.cf and commented out the securityfocus entry. 2. As spamd user, did: spamassassin [EMAIL PROTECTED] SpamAssassin auto-whitelist: removing address: [EMAIL PROTECTED] 3. Verified it was gone: $ check_whitelist | grep [EMAIL PROTECTED] [EMAIL PROTECTED]: ~]$ 4. Re-ran the message through SA. There was no AWL/whitelist adjustment, and it scored 13.7 5. Checked to verify the first posting from that user was given an appropriate score: $ check_whitelist | grep [EMAIL PROTECTED] 13.7 (13.7/1) -- [EMAIL PROTECTED]|ip=205.206 Now I'm the first to admit I may have something wrong here, but it sure seems that the default whitelist, or some other factor can have 'interesting' side effects on initial AWL values. Note that this is apparently causing the same problem that Michael Shlief observed: Sender address not in default or added whitelist is impacting mail to list sent by others due to AWL (apparently). I checked the wikis, but didn't find a good rundown on whitelist. This is, I think, a whitelist issue manifesting itself as an AWL scoring issue. So: 1. How is an entry in the default from whitelist (or at least scored that way) affecting this scoring if no sender fields (that I can see) relate? 2. Would a sender address of [EMAIL PROTECTED] match the default whitelist entry with no anchoring $? 3. Can I tweak the +/- score awarded to black/whitelist entries to be closer to the spam thershold score? 4. How is AWL initialized when there's no previous entry from a user? 5. Does AWL relate to default whitelists in any non-obvious way? I still like what AWL does for me, but I'm beggining to think whitelist/blacklist entries (with +/- 100 adjustments) are more risky than equivalent rules that add +/- < threshold to avoid this "blowing out" on even obviously spammy messages. Had it not been for that VERY HIGH starting value for the sender (from the default whitelist I presume), this would've still been scored as spam. Any clarification much appreciated. - Bob
