Joseph Brennan <[EMAIL PROTECTED]> writes: > /Dear .{0,12}(web ?mail|columbia\.edu)/i > > /Password.{0,10}\([\s\.\*\_]+\)/ > > /you must reply to this email/i > > Reply-to =~ /[EMAIL PROTECTED]/
I created a meta-rule out of these (with a score of 8), and then ran spamassassin -D < phish to see how it worked, it matched the metarule flawlessly, but the phish ended up with only a 5.4 score due to BAYES_00 dragging it down. That was surprising to me, so I started to wonder if my bayes DB was poisoned. I ran some stats, and the results seem to indicate a healthy bayes database (unless I am reading this wrong)... A side note: its interesting to note how only 9% of our email is spam, which seems low, but maybe clamav-milter+rbls are blocking the remaining 40%? Email: 2379392 Autolearn: 1075396 AvgScore: -6.32 AvgScanTime: 5.96 sec Spam: 227816 Autolearn: 114079 AvgScore: 14.75 AvgScanTime: 4.23 sec Ham: 2151576 Autolearn: 961317 AvgScore: -8.56 AvgScanTime: 6.15 sec Time Spent Running SA: 3941.26 hours Time Spent Processing Spam: 267.76 hours Time Spent Processing Ham: 3673.50 hours TOP SPAM RULES FIRED ---------------------------------------------------------------------- RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM ---------------------------------------------------------------------- 1 HTML_MESSAGE 154522 54.03 67.83 52.57 2 BAYES_99 134531 6.09 59.05 0.48 3 BOTNET 133687 8.90 58.68 3.63 4 RDNS_NONE 102255 10.19 44.88 6.51 5 URIBL_JP_SURBL 98879 4.94 43.40 0.87 6 MIME_HTML_ONLY 87518 7.62 38.42 4.36 7 URIBL_OB_SURBL 76624 3.98 33.63 0.84 8 DCC_CHECK 74600 8.51 32.75 5.94 9 URIBL_AB_SURBL 59890 2.72 26.29 0.23 10 URIBL_SC_SURBL 53911 2.51 23.66 0.27 11 RCVD_IN_BL_SPAMCOP_NET 43120 2.43 18.93 0.68 12 URIBL_WS_SURBL 38251 1.79 16.79 0.21 13 URIBL_RHS_DOB 36565 2.17 16.05 0.70 14 BAYES_50 35322 3.93 15.50 2.71 15 HTML_IMAGE_ONLY_16 33887 1.68 14.87 0.28 16 HTML_SHORT_LINK_IMG_2 33118 1.56 14.54 0.19 17 HTML_IMAGE_RATIO_02 32757 2.93 14.38 1.72 18 URIBL_SBL 30456 1.80 13.37 0.57 19 RAZOR2_CHECK 27722 2.55 12.17 1.53 20 RAZOR2_CF_RANGE_51_100 26856 2.41 11.79 1.41 ---------------------------------------------------------------------- TOP HAM RULES FIRED ---------------------------------------------------------------------- RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM ---------------------------------------------------------------------- 1 BAYES_00 2002969 84.67 5.15 93.09 2 HTML_MESSAGE 1131073 54.03 67.83 52.57 3 UNPARSEABLE_RELAY 760567 32.93 10.12 35.35 4 DKIM_SIGNED 693328 29.74 6.26 32.22 5 DKIM_VERIFIED 531590 22.67 3.38 24.71 6 ALL_TRUSTED 173612 7.30 0.05 8.07 7 USER_IN_WHITELIST 155704 6.54 0.00 7.24 8 RDNS_NONE 140127 10.19 44.88 6.51 9 DCC_CHECK 127844 8.51 32.75 5.94 10 RCVD_IN_DNSWL_LOW 101863 4.31 0.34 4.73 11 MIME_HTML_ONLY 93817 7.62 38.42 4.36 12 RCVD_IN_DNSWL_MED 90038 3.81 0.31 4.18 13 WHOIS_NETSOLPR 87575 3.72 0.38 4.07 14 MIME_QP_LONG_LINE 82804 4.49 10.52 3.85 15 BOTNET 78052 8.90 58.68 3.63 16 BAYES_50 58286 3.93 15.50 2.71 17 FUZZY_AMBIEN 53284 2.28 0.38 2.48 18 SARE_SUB_ENC_UTF8 50533 2.14 0.17 2.35 19 SARE_MILLIONSOF 42268 1.84 0.67 1.96 20 FORGED_YAHOO_RCVD 38762 1.74 1.16 1.80 ---------------------------------------------------------------------- Then I looked to see what bayes did with the message, but I do not understand how to read the output, can someone explain this to me and give me an idea why BAYES_00 fired when we've been feeding every one of these spams to bayes to train on it? $ spamassassin -D bayes < phish [9595] dbg: bayes: using username: @GLOBAL [9595] dbg: bayes: database connection established [9595] dbg: bayes: found bayes db version 3 [9595] dbg: bayes: Using userid: 4 [9595] dbg: bayes: corpus size: nspam = 6782956, nham = 15364321 [9595] dbg: bayes: header tokens for *p = "U*mayodayo D*3web.net D*net" [9595] dbg: bayes: header tokens for *F = "U*mayodayo D*3web.net D*net" [9595] dbg: bayes: header tokens for Reply-to = "U*s.team43 D*live.com D*com" [9595] dbg: bayes: header tokens for MIME-Version = "" [9595] dbg: bayes: header tokens for *c = "/plain; charset=ISO-8859-1" [9595] dbg: bayes: header tokens for Content-Transfer-Encoding = "8bit" [9595] dbg: bayes: header tokens for X-Originating-IP = "196.207.0.227" [9595] dbg: bayes: header tokens for To = "" [9595] dbg: bayes: header tokens for X-Languages = " en" [9595] dbg: bayes: header tokens for X-Languages-Length = " 1213" [9595] dbg: bayes: header tokens for X-Spam-Relays-External = " [ ip=209.197.145.198 rdns=reef.cybersurf.com helo=reef.cybersurf.com by=cat.cia.com ident= envfrom= intl=0 id=1Kw6iz-0002Li-Pg auth= msa=0 ] [ ip=196.207.0.227 rdns=196-207-0-227.netcomng.com helo=196-207-0-227.netcomng.com by=webmail.3web.com ident= envfrom= intl=0 id= auth=HTTP msa=0 ] [ ip=196.207.0.227 rdns= helo= by= ident= envfrom= intl=0 id= auth= msa=0 ]" [9595] dbg: bayes: header tokens for X-Spam-Relays-Internal = " " [9595] dbg: bayes: header tokens for *RT = " " [9595] dbg: bayes: header tokens for *RU = " [ ip=209.197.145.198 rdns=reef.cybersurf.com helo=reef.cybersurf.com by=cat.cia.com ident= envfrom= intl=0 id=1Kw6iz-0002Li-Pg auth= msa=0 ] [ ip=196.207.0.227 rdns=196-207-0-227.netcomng.com helo=196-207-0-227.netcomng.com by=webmail.3web.com ident= envfrom= intl=0 id= auth=HTTP msa=0 ] [ ip=196.207.0.227 rdns= helo= by= ident= envfrom= intl=0 id= auth= msa=0 ]" [9595] dbg: bayes: header tokens for *r = " 196-207-0-227.netcomng.com (196-207-0-227.netcomng.com [196.207.0 ip*196.207.0.227 ]) by webmail.3web.com (IMP) HTTP <[EMAIL PROTECTED]>; " [9595] dbg: bayes: header tokens for *r = " 196-207-0-227.netcomng.com (196-207-0-227.netcomng.com [196.207.0 ip*196.207.0.227 ]) by webmail.3web.com (IMP) HTTP <[EMAIL PROTECTED]>; apache by reef.cybersurf.com local (Exim 4.44) id 1Kw6j0-0006W5-UJ; " [9595] dbg: bayes: tok_get_all: token count: 142 [9595] dbg: bayes: token 'weekly' => 0.000135596068218096 [9595] dbg: bayes: token 'becomes' => 0.000298722931704609 [9595] dbg: bayes: token 'inbox' => 0.000343185200935573 [9595] dbg: bayes: token 'one's' => 0.000597114317425083 [9595] dbg: bayes: token 'folder' => 0.00064482620854974 [9595] dbg: bayes: token 'webmail' => 0.000671660424469413 [9595] dbg: bayes: token 'INBOX' => 0.000805791313030454 [9595] dbg: bayes: token 'Webmail' => 0.00100686213349969 [9595] dbg: bayes: token 'inboxes' => 0.00107385229540918 [9595] dbg: bayes: token 'SPACE' => 0.0011503920171062 [9595] dbg: bayes: token 'reset' => 0.00200996264009963 [9595] dbg: bayes: token 'oldest' => 0.00320874751491054 [9595] dbg: bayes: token 'SAVE' => 0.00400496277915633 [9595] dbg: bayes: token 'Bates' => 0.0156699029126214 [9595] dbg: bayes: token 'bates' => 0.0156699029126214 [9595] dbg: bayes: token 'current' => 0.0200447781112092 [9595] dbg: bayes: token 'H*r:IMP' => 0.0961561369397845 [9595] dbg: bayes: token 'notified' => 0.121287867011135 [9595] dbg: bayes: token 'Password' => 0.13640095340516 [9595] dbg: bayes: token 'HX-Spam-Relays-External:sk:webmail' => 0.1492193587257 [9595] dbg: bayes: token 'H*RU:sk:webmail' => 0.1492193587257 [9595] dbg: bayes: score = 1.83186799063151e-15 Any ideas would be very appreciated! My goal is to stop these phishers from getting their mail through, but even with a customized rule set to a high score, they will get through if BAYES_00 fires... micah