> Hi, > I am losing confident in SA, the training process is > pretty slow or it doesn't seem to be learning. > I am training SA with around 30-50 manually identified > spam (moving spam mails to and spam folder created in > squirrelmail and crond the sa-train command on that > folder every hour to train and delete them). > > The script is tested to be working on the shell before I > put it on crond > > However, I found that the learning process is either not > right or it is rather slow. > > I gone through the headers of the spams and found that > even almost identical (in content) spams always got a > score 0.1 and these spams are received on separated > occasions across several days. This had made me losing > confident on SA. > > I wonder if had it setup correct to detect and learn > spams . I am using a default setup from qmail-toaster > cnt50 , do I need more filters to harden my defense? Any > recommendations you will be appreciated. > > Here are sample samples I taken from my mailbox on this > server, > (eg, sample spam 1 and 8 are almost identical in content > but they are both scored with only 0.1 . : ( > > http://www.keac.com/id3303/spam-egs.txt
Mail #1 here Content preview: == US Drugstore == Voted as No.1 US pharmacy on Internet Over 80 meds on our online store We accept Visa, Master Card, JCB, Dinner & eCheck [...] Content analysis details: (17.4 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 3.0 RCVD_IN_XBL RBL: Received via a relay in Spamhaus XBL [68.243.81.116 listed in zen.spamhaus.org] 0.9 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL 2.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net [Blocked - see <http://www.spamcop.net/bl.shtml?68.243.81.116>] 1.2 INVALID_DATE Invalid Date: header (not RFC 2822) 1.2 TO_MALFORMED To: has a malformed address 4.0 BOTNET Relay might be a spambot or virusbot [botnet0.8,ip=68.243.81.116,rdns=68-243-81-116.area7.spcsdns.net,maildomain=mediafutures.org,client,ipinhostname] 1.0 BAYES_60 BODY: Bayesian spam probability is 60 to 80% [score: 0.6572] 0.1 RDNS_NONE Delivered to trusted network by a host with no rDNS 4.0 JM_SOUGHT_2 JM_SOUGHT_2