Joseph Brennan <[EMAIL PROTECTED]> writes:

> /Dear .{0,12}(web ?mail|columbia\.edu)/i
>
> /Password.{0,10}\([\s\.\*\_]+\)/
>
> /you must reply to this email/i
>
> Reply-to =~ /[EMAIL PROTECTED]/

I created a meta-rule out of these (with a score of 8), and then ran
spamassassin -D < phish to see how it worked, it matched the metarule
flawlessly, but the phish ended up with only a 5.4 score due to BAYES_00
dragging it down. That was surprising to me, so I started to wonder if
my bayes DB was poisoned. 

I ran some stats, and the results seem to indicate a healthy bayes
database (unless I am reading this wrong)... A side note: its
interesting to note how only 9% of our email is spam, which seems low,
but maybe clamav-milter+rbls are blocking the remaining 40%?

Email:  2379392  Autolearn: 1075396  AvgScore:  -6.32  AvgScanTime:  5.96 sec
Spam:    227816  Autolearn: 114079  AvgScore:  14.75  AvgScanTime:  4.23 sec
Ham:    2151576  Autolearn: 961317  AvgScore:  -8.56  AvgScanTime:  6.15 sec

Time Spent Running SA:      3941.26 hours
Time Spent Processing Spam:  267.76 hours
Time Spent Processing Ham:  3673.50 hours

TOP SPAM RULES FIRED
----------------------------------------------------------------------
RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM        
----------------------------------------------------------------------
   1    HTML_MESSAGE                    154522   54.03   67.83   52.57
   2    BAYES_99                        134531    6.09   59.05    0.48
   3    BOTNET                          133687    8.90   58.68    3.63
   4    RDNS_NONE                       102255   10.19   44.88    6.51
   5    URIBL_JP_SURBL                  98879     4.94   43.40    0.87
   6    MIME_HTML_ONLY                  87518     7.62   38.42    4.36
   7    URIBL_OB_SURBL                  76624     3.98   33.63    0.84
   8    DCC_CHECK                       74600     8.51   32.75    5.94
   9    URIBL_AB_SURBL                  59890     2.72   26.29    0.23
  10    URIBL_SC_SURBL                  53911     2.51   23.66    0.27
  11    RCVD_IN_BL_SPAMCOP_NET          43120     2.43   18.93    0.68
  12    URIBL_WS_SURBL                  38251     1.79   16.79    0.21
  13    URIBL_RHS_DOB                   36565     2.17   16.05    0.70
  14    BAYES_50                        35322     3.93   15.50    2.71
  15    HTML_IMAGE_ONLY_16              33887     1.68   14.87    0.28
  16    HTML_SHORT_LINK_IMG_2           33118     1.56   14.54    0.19
  17    HTML_IMAGE_RATIO_02             32757     2.93   14.38    1.72
  18    URIBL_SBL                       30456     1.80   13.37    0.57
  19    RAZOR2_CHECK                    27722     2.55   12.17    1.53
  20    RAZOR2_CF_RANGE_51_100          26856     2.41   11.79    1.41
----------------------------------------------------------------------

TOP HAM RULES FIRED
----------------------------------------------------------------------
RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM        
----------------------------------------------------------------------
   1    BAYES_00                        2002969  84.67    5.15   93.09
   2    HTML_MESSAGE                    1131073  54.03   67.83   52.57
   3    UNPARSEABLE_RELAY               760567   32.93   10.12   35.35
   4    DKIM_SIGNED                     693328   29.74    6.26   32.22
   5    DKIM_VERIFIED                   531590   22.67    3.38   24.71
   6    ALL_TRUSTED                     173612    7.30    0.05    8.07
   7    USER_IN_WHITELIST               155704    6.54    0.00    7.24
   8    RDNS_NONE                       140127   10.19   44.88    6.51
   9    DCC_CHECK                       127844    8.51   32.75    5.94
  10    RCVD_IN_DNSWL_LOW               101863    4.31    0.34    4.73
  11    MIME_HTML_ONLY                  93817     7.62   38.42    4.36
  12    RCVD_IN_DNSWL_MED               90038     3.81    0.31    4.18
  13    WHOIS_NETSOLPR                  87575     3.72    0.38    4.07
  14    MIME_QP_LONG_LINE               82804     4.49   10.52    3.85
  15    BOTNET                          78052     8.90   58.68    3.63
  16    BAYES_50                        58286     3.93   15.50    2.71
  17    FUZZY_AMBIEN                    53284     2.28    0.38    2.48
  18    SARE_SUB_ENC_UTF8               50533     2.14    0.17    2.35
  19    SARE_MILLIONSOF                 42268     1.84    0.67    1.96
  20    FORGED_YAHOO_RCVD               38762     1.74    1.16    1.80
----------------------------------------------------------------------


Then I looked to see what bayes did with the message, but I do not
understand how to read the output, can someone explain this to me and
give me an idea why BAYES_00 fired when we've been feeding every one of
these spams to bayes to train on it?

$ spamassassin -D bayes < phish 
[9595] dbg: bayes: using username: @GLOBAL
[9595] dbg: bayes: database connection established
[9595] dbg: bayes: found bayes db version 3
[9595] dbg: bayes: Using userid: 4
[9595] dbg: bayes: corpus size: nspam = 6782956, nham = 15364321
[9595] dbg: bayes: header tokens for *p = "U*mayodayo D*3web.net D*net"
[9595] dbg: bayes: header tokens for *F = "U*mayodayo D*3web.net D*net"
[9595] dbg: bayes: header tokens for Reply-to = "U*s.team43 D*live.com
D*com"
[9595] dbg: bayes: header tokens for MIME-Version = ""
[9595] dbg: bayes: header tokens for *c = "/plain; charset=ISO-8859-1"
[9595] dbg: bayes: header tokens for Content-Transfer-Encoding = "8bit"
[9595] dbg: bayes: header tokens for X-Originating-IP = "196.207.0.227"
[9595] dbg: bayes: header tokens for To = ""
[9595] dbg: bayes: header tokens for X-Languages = " en"
[9595] dbg: bayes: header tokens for X-Languages-Length = " 1213"
[9595] dbg: bayes: header tokens for X-Spam-Relays-External = " [
ip=209.197.145.198 rdns=reef.cybersurf.com helo=reef.cybersurf.com
by=cat.cia.com ident= envfrom= intl=0 id=1Kw6iz-0002Li-Pg auth= msa=0 ]
[ ip=196.207.0.227 rdns=196-207-0-227.netcomng.com
helo=196-207-0-227.netcomng.com by=webmail.3web.com ident= envfrom=
intl=0 id= auth=HTTP msa=0 ] [ ip=196.207.0.227 rdns= helo= by= ident=
envfrom= intl=0 id= auth= msa=0 ]"
[9595] dbg: bayes: header tokens for X-Spam-Relays-Internal = " "
[9595] dbg: bayes: header tokens for *RT = " "
[9595] dbg: bayes: header tokens for *RU = " [ ip=209.197.145.198
rdns=reef.cybersurf.com helo=reef.cybersurf.com by=cat.cia.com ident=
envfrom= intl=0 id=1Kw6iz-0002Li-Pg auth= msa=0 ] [ ip=196.207.0.227
rdns=196-207-0-227.netcomng.com helo=196-207-0-227.netcomng.com
by=webmail.3web.com ident= envfrom= intl=0 id= auth=HTTP msa=0 ] [
ip=196.207.0.227 rdns= helo= by= ident= envfrom= intl=0 id= auth= msa=0
]"
[9595] dbg: bayes: header tokens for *r = " 196-207-0-227.netcomng.com
(196-207-0-227.netcomng.com [196.207.0 ip*196.207.0.227 ]) by
webmail.3web.com (IMP) HTTP <[EMAIL PROTECTED]>; "
[9595] dbg: bayes: header tokens for *r = " 196-207-0-227.netcomng.com
(196-207-0-227.netcomng.com [196.207.0 ip*196.207.0.227 ]) by
webmail.3web.com (IMP) HTTP <[EMAIL PROTECTED]>; apache by
reef.cybersurf.com local (Exim 4.44) id 1Kw6j0-0006W5-UJ; "
[9595] dbg: bayes: tok_get_all: token count: 142
[9595] dbg: bayes: token 'weekly' => 0.000135596068218096
[9595] dbg: bayes: token 'becomes' => 0.000298722931704609
[9595] dbg: bayes: token 'inbox' => 0.000343185200935573
[9595] dbg: bayes: token 'one's' => 0.000597114317425083
[9595] dbg: bayes: token 'folder' => 0.00064482620854974
[9595] dbg: bayes: token 'webmail' => 0.000671660424469413
[9595] dbg: bayes: token 'INBOX' => 0.000805791313030454
[9595] dbg: bayes: token 'Webmail' => 0.00100686213349969
[9595] dbg: bayes: token 'inboxes' => 0.00107385229540918
[9595] dbg: bayes: token 'SPACE' => 0.0011503920171062
[9595] dbg: bayes: token 'reset' => 0.00200996264009963
[9595] dbg: bayes: token 'oldest' => 0.00320874751491054
[9595] dbg: bayes: token 'SAVE' => 0.00400496277915633
[9595] dbg: bayes: token 'Bates' => 0.0156699029126214
[9595] dbg: bayes: token 'bates' => 0.0156699029126214
[9595] dbg: bayes: token 'current' => 0.0200447781112092
[9595] dbg: bayes: token 'H*r:IMP' => 0.0961561369397845
[9595] dbg: bayes: token 'notified' => 0.121287867011135
[9595] dbg: bayes: token 'Password' => 0.13640095340516
[9595] dbg: bayes: token 'HX-Spam-Relays-External:sk:webmail' => 0.1492193587257
[9595] dbg: bayes: token 'H*RU:sk:webmail' => 0.1492193587257
[9595] dbg: bayes: score = 1.83186799063151e-15

Any ideas would be very appreciated! My goal is to stop these phishers
from getting their mail through, but even with a customized rule set to
a high score, they will get through if BAYES_00 fires...

micah

Reply via email to